Feed fetched in 46 ms.
Content type is application/xml; charset=utf-8.
Feed is 174,459 characters long.
Warning Feed is missing an ETag.
Feed has a last modified date of Tue, 27 Jan 2026 16:58:08 GMT.
Feed is well-formed XML.
Warning Feed has no styling.
This is an Atom feed.
Feed title: Simon Willison's Weblog
Error Feed self link: http://simonwillison.net/atom/everything/ does not match feed URL: https://simonwillison.net/atom/everything/.
Warning Feed is missing an image.
Feed has 30 items.
First item published on 2026-01-27T16:58:08.000Z
Last item published on 2026-01-11T22:59:23.000Z
All items have published dates.
Newest item was published on 2026-01-27T16:58:08.000Z.
Home page URL: http://simonwillison.net/
Error Home page URL is on a different protocol: http:.
Warning Home page URL redirected to https://simonwillison.net/.
Home page has feed discovery link in <head>.
Home page has a link to the feed in the <body>
<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom">
<title>Simon Willison's Weblog</title>
<link href="http://simonwillison.net/" rel="alternate"/>
<link href="http://simonwillison.net/atom/everything/" rel="self"/>
<id>http://simonwillison.net/</id>
<updated>2026-01-27T16:58:08+00:00</updated>
<author>
<name>Simon Willison</name>
</author>
<entry>
<title>One Human + One Agent = One Browser From Scratch</title>
<link href="https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-everything" rel="alternate"/>
<published>2026-01-27T16:58:08+00:00</published>
<updated>2026-01-27T16:58:08+00:00</updated>
<id>https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-everything</id>
<summary type="html"><p><strong><a href="https://emsh.cat/one-human-one-agent-one-browser/">One Human + One Agent = One Browser From Scratch</a></strong></p>
embedding-shapes was <a href="https://emsh.cat/cursor-implied-success-without-evidence/">so infuriated</a> by the hype around Cursor's <a href="https://simonwillison.net/2026/Jan/23/fastrender/">FastRender browser project</a> - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.</p>
<p>The result is <a href="https://github.com/embedding-shapes/one-agent-one-browser">one-agent-one-browser</a> and it's <em>really</em> impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.</p>
<p>I installed the <a href="https://github.com/embedding-shapes/one-agent-one-browser/releases/tag/0.1.0">1MB macOS binary release</a> and ran it against my blog:</p>
<pre><code>chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64
~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/
</code></pre>
<p>Here's the result:</p>
<p><img alt="My blog rendered in a window. Everything is in the right place, the CSS gradients look good, the feed subscribe SVG icon is rendered correctly but there's a missing PNG image." src="https://static.simonwillison.net/static/2026/one-agent-simonwillison.jpg" /></p>
<p>It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).</p>
<p>The code is pretty readable too - here's <a href="https://github.com/embedding-shapes/one-agent-one-browser/blob/0.1.0/src/layout/flex.rs">the flexbox implementation</a>.</p>
<p>I had thought that "build a web browser" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.</p>
<p>Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!</p>
<p>I'm going to upgrade my <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise">prediction for 2029</a>: I think we're going to get a <em>production-grade</em> web browser built by a small team using AI assistance by then.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46779522">Show Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/predictions">predictions</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/rust">rust</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/codex-cli">codex-cli</a>, <a href="https://simonwillison.net/tags/browser-challenge">browser-challenge</a></p></summary>
<category term="browsers"/>
<category term="predictions"/>
<category term="ai"/>
<category term="rust"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="coding-agents"/>
<category term="codex-cli"/>
<category term="browser-challenge"/>
</entry>
<entry>
<title>Kimi K2.5: Visual Agentic Intelligence</title>
<link href="https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-everything" rel="alternate"/>
<published>2026-01-27T15:07:41+00:00</published>
<updated>2026-01-27T15:07:41+00:00</updated>
<id>https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.kimi.com/blog/kimi-k2-5.html">Kimi K2.5: Visual Agentic Intelligence</a></strong></p>
Kimi K2 landed <a href="https://simonwillison.net/2025/Jul/11/kimi-k2/">in July</a> as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking <a href="https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/">in November</a> which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:</p>
<blockquote>
<p>Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.</p>
</blockquote>
<p>The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:</p>
<blockquote>
<p>For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.</p>
</blockquote>
<p>I used the <a href="https://openrouter.ai/moonshotai/kimi-k2.5">OpenRouter Chat UI</a> to have it "Generate an SVG of a pelican riding a bicycle", and it did <a href="https://gist.github.com/simonw/32a85e337fbc6ee935d10d89726c0476">quite well</a>:</p>
<p><img alt="Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame." src="https://static.simonwillison.net/static/2026/kimi-k2.5-pelican.png" /></p>
<p>As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:</p>
<blockquote>
<p>I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.</p>
</blockquote>
<p>Here's <a href="https://gist.github.com/simonw/ee2583b2eb5706400a4737f56d57c456">the full response</a>. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt <a href="https://claude.ai/share/df9258e7-97ba-4362-83da-76d31d96196f">against Claude Opus 4.5</a> and <a href="https://chatgpt.com/share/6978d48c-3f20-8006-9c77-81161f899104">against GPT-5.2 Thinking</a>.</p>
<p>The <a href="https://huggingface.co/moonshotai/Kimi-K2.5">Hugging Face repository</a> is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:</p>
<blockquote>
<p>Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.</p>
</blockquote>
<p>Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has <a href="https://twitter.com/awnihannun/status/1943723599971443134">been demonstrated to work</a> with previous trillion parameter K2 models.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46775961">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/hugging-face">hugging-face</a>, <a href="https://simonwillison.net/tags/vision-llms">vision-llms</a>, <a href="https://simonwillison.net/tags/llm-tool-use">llm-tool-use</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle">pelican-riding-a-bicycle</a>, <a href="https://simonwillison.net/tags/llm-release">llm-release</a>, <a href="https://simonwillison.net/tags/ai-in-china">ai-in-china</a>, <a href="https://simonwillison.net/tags/moonshot">moonshot</a>, <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/kimi">kimi</a>, <a href="https://simonwillison.net/tags/janky-licenses">janky-licenses</a></p></summary>
<category term="ai"/>
<category term="llms"/>
<category term="hugging-face"/>
<category term="vision-llms"/>
<category term="llm-tool-use"/>
<category term="ai-agents"/>
<category term="pelican-riding-a-bicycle"/>
<category term="llm-release"/>
<category term="ai-in-china"/>
<category term="moonshot"/>
<category term="parallel-agents"/>
<category term="kimi"/>
<category term="janky-licenses"/>
</entry>
<entry>
<title>Tips for getting coding agents to write good Python tests</title>
<link href="https://simonwillison.net/2026/Jan/26/tests/#atom-everything" rel="alternate"/>
<published>2026-01-26T23:55:29+00:00</published>
<updated>2026-01-26T23:55:29+00:00</updated>
<id>https://simonwillison.net/2026/Jan/26/tests/#atom-everything</id>
<summary type="html"><p>Someone <a href="https://news.ycombinator.com/item?id=46765460#46765823">asked</a> on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:</p>
<hr />
<p>I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.</p>
<p>Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.</p>
<p>Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.</p>
<p>"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.</p>
<p>Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.</p>
<p>I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.</p>
<p>One last tip I use a lot is this:</p>
<pre><code>Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses
</code></pre>
<p>I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.</p>
<p>Tags: <a href="https://simonwillison.net/tags/testing">testing</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/python">python</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/hacker-news">hacker-news</a>, <a href="https://simonwillison.net/tags/pytest">pytest</a></p></summary>
<category term="testing"/>
<category term="coding-agents"/>
<category term="python"/>
<category term="generative-ai"/>
<category term="ai"/>
<category term="llms"/>
<category term="hacker-news"/>
<category term="pytest"/>
</entry>
<entry>
<title>ChatGPT Containers can now run bash, pip/npm install packages, and download files</title>
<link href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything" rel="alternate"/>
<published>2026-01-26T19:19:31+00:00</published>
<updated>2026-01-26T19:19:31+00:00</updated>
<id>https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything</id>
<summary type="html"><p>One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter <a href="https://simonwillison.net/2023/Apr/12/code-interpreter/">nearly three years ago</a>, was half-heartedly rebranded to "Advanced Data Analysis" at some point and is generally really difficult to find detailed documentation about. Case in point: it appears to have had a <em>massive</em> upgrade at some point in the past few months, and I can't find documentation about the new capabilities anywhere!</p>
<p>Here are the most notable new features:</p>
<ol>
<li>ChatGPT can <strong>directly run Bash commands</strong> now. Previously it was limited to Python code only, although it could run shell commands via the Python <code>subprocess</code> module.</li>
<li>
<strong>It has Node.js</strong> and can run JavaScript directly in addition to Python. I also got it to run "hello world" in <strong>Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++</strong>. No Rust yet though!</li>
<li>While the container still can't make outbound network requests, <strong><code>pip install package</code> and <code>npm install package</code> both work</strong> now via a custom proxy mechanism.</li>
<li>ChatGPT can locate the URL for a file on the web and use a <code>container.download</code> tool to <strong>download that file and save it to a path</strong> within the sandboxed container.</li>
</ol>
<p>This is a substantial upgrade! ChatGPT can now write and then test code in 10 new languages (11 if you count Bash), can find files online and download them into the container, and can install additional packages via <code>pip</code> and <code>npm</code> to help it solve problems.</p>
<p>(OpenAI <em>really</em> need to develop better habits at <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">keeping their release notes up-to-date</a>!)</p>
<p>I was initially suspicious that maybe I'd stumbled into a new preview feature that wasn't available to everyone, but I <a href="https://chatgpt.com/share/6977aa7c-7bd8-8006-8129-8c9e25126fed">tried some experiments</a> in a free ChatGPT account and confirmed that the new features are available there as well.</p>
<ul>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#container-download">container.download</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#is-container-download-a-data-exfiltration-vulnerability-">Is container.download a data exfiltration vulnerability?</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#bash-and-other-languages">Bash and other languages</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#installing-packages-from-pip-and-npm">Installing packages from pip and npm</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#adding-it-all-together">Adding it all together</a></li>
</ul>
<h4 id="container-download">container.download</h4>
<p>My first clue to the new features came the other day when I got curious about Los Angeles air quality, in particular has the growing number of electric vehicles there had a measurable impact?</p>
<p>I prompted a fresh GPT-5.2 Thinking session with:</p>
<blockquote>
<p>Show me Los Angeles air quality over time for last 20 years</p>
</blockquote>
<p>Here's the <a href="https://chatgpt.com/share/6977a80d-e6e4-8006-b4fe-ed8478fe99d8">shared transcript</a>. Annoyingly that doesn't make the thinking trace available, so here's a screenshot:</p>
<p><img src="https://static.simonwillison.net/static/2026/container-download.jpg" alt="Exploring methods to fetch AQI data and reports Downloading AQI data for Los Angeles I found an xlsx file titled &quot;Air Quality Statistics by City, 2000-2023&quot;, which should include data for Los Angeles. While user requested data for Using provided URL to download file The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. Let’s proceed with this approach and see if it works out for downloading the file directly! Analyzed Parsing and analyzing the data for Los Angeles Now that we've downloaded the data, I'll parse it using Python. First, I'll focus on identifying the row(s) corresponding to Los Angeles, which may be labeled under &quot;Los Angeles-Long Beach-Anaheim, CA.&quot; Afterward, I'll plot metrics like &quot;Median AQI&quot; and &quot;Days Good&quot; over time to visualize trends. Let’s begin!" style="max-width: 100%;" /></p>
<p>This bit instantly caught my eye:</p>
<blockquote>
<p>[...] The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. [...]</p>
</blockquote>
<p>What's <code>container.download</code>? I <a href="https://chatgpt.com/share/6977aab0-b874-8006-9b54-d3d216ed884b">asked ChatGPT</a>:</p>
<blockquote>
<p>Without running any searches tell me about your container.download tool</p>
</blockquote>
<p>And got back:</p>
<blockquote>
<p>Sure. <code>container.download</code> is a built-in tool I can call that <strong>fetches a file from a URL and saves it into the sandboxed container filesystem</strong> for this chat session.</p>
<h4 id="what-it-does">What it does</h4>
<ul>
<li>Takes a <strong>publicly reachable URL</strong> and a <strong>destination filepath</strong> in the container.</li>
<li>Downloads the bytes from that URL and writes them to the given path.</li>
<li>After that, I can read/process the file locally in the container (e.g., unzip it, parse it with Python, open it as an image, convert it, etc.).</li>
</ul>
<h4 id="signature-inputs-">Signature (inputs)</h4>
<p>It’s called like this (conceptually):</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"url"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://example.com/somefile.zip<span class="pl-pds">"</span></span>,
<span class="pl-ent">"filepath"</span>: <span class="pl-s"><span class="pl-pds">"</span>/mnt/data/somefile.zip<span class="pl-pds">"</span></span>
}</pre></div>
</blockquote>
<p>Further experimentation confirmed that <code>container.download</code> behaves as described.</p>
<p>I had it download one file from a server with logs enabled and captured the IP address and user-agent. Here are the most interesting headers:</p>
<pre><code>User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Accept: text/html, application/xhtml+xml, application/xml;q=0.9, image/avif, image/webp, image/apng, */*;q=0.8, application/signed-exchange;v=b3;q=0.9
Cf-Connecting-Ip: 52.230.164.178
</code></pre>
<p>That <code>52.230.164.178</code> IP address <a href="http://ip-api.com/json/52.230.164.178">resolves to</a> Microsoft Azure Cloud (centralus) in Des Moines, Iowa.</p>
<h4 id="is-container-download-a-data-exfiltration-vulnerability-">Is container.download a data exfiltration vulnerability?</h4>
<p>On the one hand, this is really useful! ChatGPT can navigate around websites looking for useful files, download those files to a container and then process them using Python or other languages.</p>
<p>Is this a data exfiltration vulnerability though? Could a prompt injection attack trick ChatGPT into leaking private data out to a <code>container.download</code> call to a URL with a query string that includes sensitive information?</p>
<p>I don't think it can. I tried getting it to assemble a URL with a query string and access it using <code>container.download</code> and it couldn't do it. It told me that it got back this error:</p>
<blockquote>
<p>ERROR: download failed because url not viewed in conversation before. open the file or url using web.run first.</p>
</blockquote>
<p>This looks to me like the same safety trick <a href="https://simonwillison.net/2025/Sep/10/claude-web-fetch-tool/">used by Claude's Web Fetch tool</a>: only allow URL access if that URL was either directly entered by the user or if it came from search results that could not have been influenced by a prompt injection.</p>
<p>(I poked at this a bit more and managed to get a simple constructed query string to pass through <code>web.run</code> - a different tool entirely - but when I tried to compose a longer query string containing the previous prompt history a <code>web.run</code> filter blocked it.)</p>
<p>So I <em>think</em> this is all safe, though I'm curious if it could hold firm against a more aggressive round of attacks from a seasoned security researcher.</p>
<h4 id="bash-and-other-languages">Bash and other languages</h4>
<p>The key lesson from coding agents like Claude Code and Codex CLI is that Bash rules everything: if an agent can run Bash commands in an environment it can do almost anything that can be achieved by typing commands into a computer.</p>
<p>When Anthropic added their own code interpreter feature to Claude <a href="https://simonwillison.net/2025/Sep/9/claude-code-interpreter/">last September</a> they built that around Bash rather than just Python. It looks to me like OpenAI have now done the same thing for ChatGPT.</p>
<p>Here's what ChatGPT looks like when it runs a Bash command - here my prompt was:</p>
<blockquote>
<p>npm install a fun package and demonstrate using it</p>
</blockquote>
<p><img src="https://static.simonwillison.net/static/2026/chatgpt-bash.jpg" alt="Screenshot of ChatGPT 5.2 Thinking interface with Activity panel. Main chat shows user prompt &quot;npm install a fun package and demonstrate using it&quot; and response &quot;Thought for 32s &gt; Yep — here's a fun one: cowsay 🐄 I installed it with npm and ran a tiny Node script:&quot; followed by bash code block containing &quot;mkdir -p /mnt/data/npmfun cd /mnt/data/npmfun npm init -y npm install cowsay --no-fund --no-audit node -e &quot;const cowsay=require('cowsay'); console.log(cowsay.say({text:'Hel&quot; and Output: section showing ASCII cow art saying &quot;&lt; Hello from npm inside the container! &gt;&quot; with cow drawing made of characters. Activity panel on right (32s) shows: &quot;Running npm install in container&quot; with bash command and output &quot;v22.16.0 10.9.2&quot;, &quot;Inspecting npm command output and container status&quot; with directory listing showing &quot;drwxr-xr-x 2 root oai_shared 80 Jan 26 17:32 . drwxrwxr-x 2 oai oai_shared 60 Jan 26 17:32 .. -rw-r--r-- 1 root root&quot;, &quot;Re-running npm install&quot; note about node_modules missing, and &quot;Rerunning npm install with output&quot; showing &quot;added 41 packages in 2s&quot;." style="max-width: 100%;" /></p>
<p>It's useful to click on the "Thinking" or "Thought for 32s" links as that opens the Activity sidebar with a detailed trace of what ChatGPT did to arrive at its answer. This helps guard against cheating - ChatGPT might claim to have run Bash in the main window but it can't fake those black and white logs in the Activity panel.</p>
<p>I had it run <a href="https://chatgpt.com/share/6977b890-f1c0-8006-b39a-9a02c7ca1993">Hello World in various languages</a> later in that same session.</p>
<p><img src="https://static.simonwillison.net/static/2026/chatgpt-hello-worlds.jpg" alt="Terminal screenshot showing &quot;Hello, world&quot; programs in multiple languages. Top section shows code: swift -e 'print(&quot;Hello, world from Swift!&quot;)' followed by cat &gt; hello.kts &lt;&lt;'KTS' / println(&quot;Hello, world from Kotlin!&quot;) / KTS / kotlinc -script hello.kts, then cat &gt; hello.c &lt;&lt;'C' / #include &lt;stdio.h&gt; / int main(){ puts(&quot;Hello, world from C!&quot;); return 0; } / C / gcc hello.c -o hello_c / ./hello_c, then cat &gt; hello.cpp &lt;&lt;'CPP' / #include &lt;iostream&gt; / int main(){ std::cout &lt;&lt; &quot;Hello, world from C++!&quot; &lt;&lt; std::endl; } / CPP / g++ hello.cpp -o hello_cpp / ./hello_cpp. Bottom section shows output: Hello, world from Ruby! / Hello, world from Perl! / Hello, world from PHP! / Hello, world from Go! / Hello, world from Java! / Hello, world from Swift! / Hello, world from Kotlin! / Hello, world from C! / Hello, world from C++!. UI shows &quot;Thought for 2m 29s&quot; and &quot;Done&quot; at bottom." style="max-width: 100%;" /></p>
<h4 id="installing-packages-from-pip-and-npm">Installing packages from pip and npm</h4>
<p>In the previous example ChatGPT installed the <code>cowsay</code> package from npm and used it to draw an ASCII-art cow. But how could it do that if the container can't make outbound network requests?</p>
<p>In <a href="https://chatgpt.com/share/69773501-b6d8-8006-bbf2-fa644561aa26">another session</a> I challenged it to explore its environment. and figure out how that worked.</p>
<p>Here's <a href="https://github.com/simonw/research/blob/main/chatgpt-container-environment/README.md">the resulting Markdown report</a> it created.</p>
<p>The key magic appears to be a <code>applied-caas-gateway1.internal.api.openai.org</code> proxy, available within the container and with various packaging tools configured to use it.</p>
<p>The following environment variables cause <code>pip</code> and <code>uv</code> to install packages from that proxy instead of directly from PyPI:</p>
<pre><code>PIP_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple
PIP_TRUSTED_HOST=packages.applied-caas-gateway1.internal.api.openai.org
UV_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple
UV_INSECURE_HOST=https://packages.applied-caas-gateway1.internal.api.openai.org
</code></pre>
<p>This one appears to get <code>npm</code> to work:</p>
<pre><code>NPM_CONFIG_REGISTRY=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../npm-public
</code></pre>
<p>And it reported these suspicious looking variables as well:</p>
<pre><code>CAAS_ARTIFACTORY_BASE_URL=packages.applied-caas-gateway1.internal.api.openai.org
CAAS_ARTIFACTORY_PYPI_REGISTRY=.../artifactory/api/pypi/pypi-public
CAAS_ARTIFACTORY_NPM_REGISTRY=.../artifactory/api/npm/npm-public
CAAS_ARTIFACTORY_GO_REGISTRY=.../artifactory/api/go/golang-main
CAAS_ARTIFACTORY_MAVEN_REGISTRY=.../artifactory/maven-public
CAAS_ARTIFACTORY_GRADLE_REGISTRY=.../artifactory/gradle-public
CAAS_ARTIFACTORY_CARGO_REGISTRY=.../artifactory/api/cargo/cargo-public/index
CAAS_ARTIFACTORY_DOCKER_REGISTRY=.../dockerhub-public
CAAS_ARTIFACTORY_READER_USERNAME=reader
CAAS_ARTIFACTORY_READER_PASSWORD=****
NETWORK=caas_packages_only
</code></pre>
<p>Neither Rust nor Docker are installed in the container environment, but maybe those registry references are a clue of features still to come.</p>
<h4 id="adding-it-all-together">Adding it all together</h4>
<p>The result of all of this? You can tell ChatGPT to use Python or Node.js packages as part of a conversation and it will be able to install them and apply them against files you upload or that it downloads from the public web. That's <em>really</em> cool.</p>
<p>The big missing feature here should be the easiest to provide: we need <strong>official documentation</strong>! A release notes entry would be a good start, but there are a lot of subtle details to how this new stuff works, its limitations and what it can be used for.</p>
<p>As always, I'd also encourage OpenAI to come up with a name for this set of features that properly represents how it works and what it can do.</p>
<p>In the meantime, I'm going to call this <strong>ChatGPT Containers</strong>.</p>
<h4 id="list-of-tools">Update: a full list of tools</h4>
<p>I decided to ask ChatGPT about other tools that were available to it in case there was anything interesting in there:</p>
<blockquote><p>List all tools that are available to you, with their exact names and descriptions and signatures</p></blockquote>
<p>Here's <a href="https://chatgpt.com/share/6977ffa0-df14-8006-9647-2b8c90ccbb81">what I got back</a>.</p>
<details><summary>List of ChatGPT tools according to GPT-5.2 Thinking on 26th January 2026</summary>
<p style="margin-top: 1em">Below are <strong>all tools available to me in this chat</strong>, with their <strong>exact names</strong>, <strong>descriptions</strong>, and <strong>signatures</strong> (as defined in my tool interface).</p>
<hr />
<h4 id="python">python</h4>
<h5 id="python-exec-freeform-any">
<code>python.exec</code> (FREEFORM) → any</h5>
<p><strong>Description:</strong> Execute Python code in a Jupyter environment for <em>private/internal reasoning</em> (not shown to you). Internet access is disabled.<br />
<strong>Signature:</strong> <code>python.exec(FREEFORM) -&gt; any</code></p>
<hr />
<h4 id="web">web</h4>
<h5 id="web-run-toolcallv5-json-any">
<code>web.run</code> (ToolCallV5 JSON) → any</h5>
<p><strong>Description:</strong> Access the internet and specialized lookups (search, open/click/find pages, screenshot PDFs, image search, product search, finance, weather, sports, calculator, time). Supports citations via returned <code>turn...</code> reference IDs.<br />
<strong>Signature:</strong><br />
<code>web.run({ open?, click?, find?, screenshot?, image_query?, product_query?, sports?, finance?, weather?, calculator?, time?, search_query?, response_length? }) -&gt; any</code></p>
<hr />
<h4 id="automations">automations</h4>
<h5 id="automations-create-any">
<code>automations.create</code> → any</h5>
<p><strong>Description:</strong> Schedule a future/recurring task/reminder/search via iCal VEVENT.<br />
<strong>Signature:</strong><br />
<code>automations.create({ prompt: string, title: string, schedule?: string, dtstart_offset_json?: string }) -&gt; any</code></p>
<h5 id="automations-update-any">
<code>automations.update</code> → any</h5>
<p><strong>Description:</strong> Modify an existing scheduled task.<br />
<strong>Signature:</strong><br />
<code>automations.update({ jawbone_id: string, schedule?: string, dtstart_offset_json?: string, prompt?: string, title?: string, is_enabled?: boolean }) -&gt; any</code></p>
<h5 id="automations-list-any">
<code>automations.list</code> → any</h5>
<p><strong>Description:</strong> List existing automations.<br />
<strong>Signature:</strong> <code>automations.list() -&gt; any</code></p>
<hr />
<h4 id="guardian-tool">guardian_tool</h4>
<h5 id="guardian-tool-get-policy-str">
<code>guardian_tool.get_policy</code> → str</h5>
<p><strong>Description:</strong> Retrieve policy guidance for specific categories (currently: <code>election_voting</code>).<br />
<strong>Signature:</strong> <code>guardian_tool.get_policy({ category: string }) -&gt; str</code></p>
<hr />
<h4 id="file-search">file_search</h4>
<h5 id="file-search-msearch-any">
<code>file_search.msearch</code> → any</h5>
<p><strong>Description:</strong> Search across available internal/connected sources (in this environment: primarily “recording_knowledge” transcripts when relevant).<br />
<strong>Signature:</strong><br />
<code>file_search.msearch({ queries?: string[], source_filter?: string[], file_type_filter?: string[], intent?: string, time_frame_filter?: { start_date?: string, end_date?: string } }) -&gt; any</code></p>
<h5 id="file-search-mclick-any">
<code>file_search.mclick</code> → any</h5>
<p><strong>Description:</strong> Expand/open previously returned search results (or connector URLs) for more context.<br />
<strong>Signature:</strong><br />
<code>file_search.mclick({ pointers?: string[], start_date?: string, end_date?: string }) -&gt; any</code></p>
<hr />
<h4 id="gmail-read-only-">gmail (read-only)</h4>
<h5 id="gmail-search-email-ids-any">
<code>gmail.search_email_ids</code> → any</h5>
<p><strong>Description:</strong> Search Gmail message IDs by query/tags (read-only).<br />
<strong>Signature:</strong><br />
<code>gmail.search_email_ids({ query?: string, tags?: string[], max_results?: integer, next_page_token?: string }) -&gt; any</code></p>
<h5 id="gmail-batch-read-email-any">
<code>gmail.batch_read_email</code> → any</h5>
<p><strong>Description:</strong> Read a batch of Gmail messages by message IDs (read-only).<br />
<strong>Signature:</strong><br />
<code>gmail.batch_read_email({ message_ids: string[] }) -&gt; any</code></p>
<hr />
<h4 id="gcal-read-only-">gcal (read-only)</h4>
<h5 id="gcal-search-events-any">
<code>gcal.search_events</code> → any</h5>
<p><strong>Description:</strong> Search Google Calendar events in a time range and/or by keyword (read-only).<br />
<strong>Signature:</strong><br />
<code>gcal.search_events({ time_min?: string, time_max?: string, timezone_str?: string, max_results?: integer, query?: string, calendar_id?: string, next_page_token?: string }) -&gt; any</code></p>
<h5 id="gcal-read-event-any">
<code>gcal.read_event</code> → any</h5>
<p><strong>Description:</strong> Read a specific calendar event by ID (read-only).<br />
<strong>Signature:</strong><br />
<code>gcal.read_event({ event_id: string, calendar_id?: string }) -&gt; any</code></p>
<hr />
<h4 id="gcontacts-read-only-">gcontacts (read-only)</h4>
<h5 id="gcontacts-search-contacts-any">
<code>gcontacts.search_contacts</code> → any</h5>
<p><strong>Description:</strong> Search Google Contacts by keyword (read-only).<br />
<strong>Signature:</strong><br />
<code>gcontacts.search_contacts({ query: string, max_results?: integer }) -&gt; any</code></p>
<hr />
<h4 id="canmore-canvas-">canmore (canvas)</h4>
<h5 id="canmore-create-textdoc-any">
<code>canmore.create_textdoc</code> → any</h5>
<p><strong>Description:</strong> Create a new canvas text document (markdown or code).<br />
<strong>Signature:</strong><br />
<code>canmore.create_textdoc({ name: string, type: "document" | "code/bash" | "code/zsh" | "code/javascript" | "code/typescript" | "code/html" | "code/css" | "code/python" | "code/json" | "code/sql" | "code/go" | "code/yaml" | "code/java" | "code/rust" | "code/cpp" | "code/swift" | "code/php" | "code/xml" | "code/ruby" | "code/haskell" | "code/kotlin" | "code/csharp" | "code/c" | "code/objectivec" | "code/r" | "code/lua" | "code/dart" | "code/scala" | "code/perl" | "code/commonlisp" | "code/clojure" | "code/ocaml" | "code/powershell" | "code/verilog" | "code/dockerfile" | "code/vue" | "code/react" | "code/other", content: string }) -&gt; any</code></p>
<h5 id="canmore-update-textdoc-any">
<code>canmore.update_textdoc</code> → any</h5>
<p><strong>Description:</strong> Apply regex-based edits to the current canvas textdoc.<br />
<strong>Signature:</strong><br />
<code>canmore.update_textdoc({ updates: { pattern: string, multiple?: boolean, replacement: string }[] }) -&gt; any</code></p>
<h5 id="canmore-comment-textdoc-any">
<code>canmore.comment_textdoc</code> → any</h5>
<p><strong>Description:</strong> Add actionable comments on matched regions of the current canvas textdoc.<br />
<strong>Signature:</strong><br />
<code>canmore.comment_textdoc({ comments: { pattern: string, comment: string }[] }) -&gt; any</code></p>
<hr />
<h4 id="python-user-visible">python_user_visible</h4>
<h5 id="python-user-visible-exec-freeform-any">
<code>python_user_visible.exec</code> (FREEFORM) → any</h5>
<p><strong>Description:</strong> Execute Python code <em>that you will see</em> (tables, plots, generated files). Internet disabled.<br />
<strong>Signature:</strong> <code>python_user_visible.exec(FREEFORM) -&gt; any</code></p>
<hr />
<h4 id="user-info">user_info</h4>
<h5 id="user-info-get-user-info-any">
<code>user_info.get_user_info</code> → any</h5>
<p><strong>Description:</strong> Get your current location and local time (or UTC if unknown).<br />
<strong>Signature:</strong> <code>user_info.get_user_info({}) -&gt; any</code></p>
<hr />
<h4 id="summary-reader">summary_reader</h4>
<h5 id="summary-reader-read-any">
<code>summary_reader.read</code> → any</h5>
<p><strong>Description:</strong> Read safe-to-share internal reasoning summaries from earlier turns (used when asked how I arrived at something).<br />
<strong>Signature:</strong> <code>summary_reader.read({ limit?: integer, offset?: integer }) -&gt; any</code></p>
<hr />
<h4 id="container">container</h4>
<h5 id="container-exec-any">
<code>container.exec</code> → any</h5>
<p><strong>Description:</strong> Run a command inside the container and return stdout/stderr.<br />
<strong>Signature:</strong><br />
<code>container.exec({ cmd: string[], session_name?: string | null, workdir?: string | null, timeout?: integer | null, env?: object | null, user?: string | null }) -&gt; any</code></p>
<h5 id="container-feed-chars-any">
<code>container.feed_chars</code> → any</h5>
<p><strong>Description:</strong> Send input to a running interactive exec session and flush output.<br />
<strong>Signature:</strong><br />
<code>container.feed_chars({ session_name: string, chars: string, yield_time_ms?: integer }) -&gt; any</code></p>
<h5 id="container-open-image-any">
<code>container.open_image</code> → any</h5>
<p><strong>Description:</strong> Display an image from an absolute path in the container (jpg/jpeg/png/webp).<br />
<strong>Signature:</strong><br />
<code>container.open_image({ path: string, user?: string | null }) -&gt; any</code></p>
<h5 id="container-download-any">
<code>container.download</code> → any</h5>
<p><strong>Description:</strong> Download a file from a URL into the container filesystem.<br />
<strong>Signature:</strong><br />
<code>container.download({ url: string, filepath: string }) -&gt; any</code></p>
<hr />
<h4 id="bio-memory-">bio (memory)</h4>
<h5 id="bio-update-freeform-any">
<code>bio.update</code> (FREEFORM) → any</h5>
<p><strong>Description:</strong> Persist (or forget) user-specific info across conversations (“memory”).<br />
<strong>Signature:</strong> <code>bio.update(FREEFORM) -&gt; any</code></p>
<hr />
<h4 id="api-tool">api_tool</h4>
<h5 id="api-tool-list-resources-any">
<code>api_tool.list_resources</code> → any</h5>
<p><strong>Description:</strong> List available “resources” (including invokable tools) in an OP-style file tree.<br />
<strong>Signature:</strong><br />
<code>api_tool.list_resources({ path?: string, cursor?: string | null, only_tools?: boolean, refetch_tools?: boolean }) -&gt; any</code></p>
<h5 id="api-tool-call-tool-any">
<code>api_tool.call_tool</code> → any</h5>
<p><strong>Description:</strong> Invoke a resource as a tool (must know its full path/URI from <code>list_resources</code>).<br />
<strong>Signature:</strong><br />
<code>api_tool.call_tool({ path: string, args: object }) -&gt; any</code></p>
<hr />
<h4 id="personal-context">personal_context</h4>
<h5 id="personal-context-search-any">
<code>personal_context.search</code> → any</h5>
<p><strong>Description:</strong> Retrieve user-specific context from prior interactions/linked sources to maintain continuity.<br />
<strong>Signature:</strong><br />
<code>personal_context.search({ query: string }) -&gt; any</code></p>
<hr />
<h4 id="image-gen">image_gen</h4>
<h5 id="image-gen-text2im-any">
<code>image_gen.text2im</code> → any</h5>
<p><strong>Description:</strong> Generate or edit images from text instructions (and optionally referenced image IDs).<br />
<strong>Signature:</strong><br />
<code>image_gen.text2im({ prompt?: string | null, size?: string | null, n?: integer | null, transparent_background?: boolean | null, is_style_transfer?: boolean | null, referenced_image_ids?: string[] | null }) -&gt; any</code></p>
<hr />
<h4 id="user-settings">user_settings</h4>
<h5 id="user-settings-get-user-settings-any">
<code>user_settings.get_user_settings</code> → any</h5>
<p><strong>Description:</strong> Read current settings and allowed values (personality, accent color, appearance).<br />
<strong>Signature:</strong> <code>user_settings.get_user_settings() -&gt; any</code></p>
<h5 id="user-settings-set-setting-any">
<code>user_settings.set_setting</code> → any</h5>
<p><strong>Description:</strong> Change one setting (accent color, appearance, or personality) to an allowed value.<br />
<strong>Signature:</strong><br />
<code>user_settings.set_setting({ setting_name: "accent_color" | "appearance" | "personality", setting_value: string }) -&gt; any</code></p>
<hr />
<h4 id="artifact-handoff">artifact_handoff</h4>
<h5 id="artifact-handoff-prepare-artifact-generation-any">
<code>artifact_handoff.prepare_artifact_generation</code> → any</h5>
<p><strong>Description:</strong> Must be called immediately when the user asks for a spreadsheet or slide deck artifact.<br />
<strong>Signature:</strong> <code>artifact_handoff.prepare_artifact_generation() -&gt; any</code></p>
</details>
<p>Tags: <a href="https://simonwillison.net/tags/pypi">pypi</a>, <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/npm">npm</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/chatgpt">chatgpt</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/code-interpreter">code-interpreter</a></p></summary>
<category term="pypi"/>
<category term="sandboxing"/>
<category term="npm"/>
<category term="ai"/>
<category term="openai"/>
<category term="generative-ai"/>
<category term="chatgpt"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="code-interpreter"/>
</entry>
<entry>
<title>the browser is the sandbox</title>
<link href="https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/#atom-everything" rel="alternate"/>
<published>2026-01-25T23:51:32+00:00</published>
<updated>2026-01-25T23:51:32+00:00</updated>
<id>https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/#atom-everything</id>
<summary type="html"><p><strong><a href="https://aifoc.us/the-browser-is-the-sandbox/">the browser is the sandbox</a></strong></p>
Paul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to operate in and put together these detailed notes on how the web browser can help:</p>
<blockquote>
<p>This got me thinking about the browser. Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web, the instant a user taps a URL. [...]</p>
<p>Could you build something like Cowork in the browser? Maybe. To find out, I built a demo called <a href="http://co-do.xyz">Co-do</a> that tests this hypothesis. In this post I want to discuss the research I've done to see how far we can get, and determine if the browser's ability to run untrusted code is useful (and good enough) for enabling software to do more for us directly on our computer.</p>
</blockquote>
<p>Paul then describes how the three key aspects of a sandbox - filesystem, network access and safe code execution - can be handled by browser technologies: the <a href="https://developer.chrome.com/docs/capabilities/web-apis/file-system-access">File System Access API</a> (still Chrome-only as far as I can tell), CSP headers with <code>&lt;iframe sandbox&gt;</code> and WebAssembly in Web Workers.</p>
<p>Co-do is a very interesting demo that illustrates all of these ideas in a single application:</p>
<p><img alt="Screenshot of Co-do application interface with robot logo. Left sidebar shows WORKSPACE section with &quot;Select Folder&quot; button, &quot;Selected folder: blog-drafts (Live updates enabled)&quot; in green text, and FILES list including gtr-t5-large.md, chatgpt.md, llm-digest-october-2..., weeknotes-2024-04-23..., lmarena-april-2025.md, weeknotes-chatgpt-fo..., tests-not-optional-c..., weeknotes-13-mar-202..., issue-for-notes.md, workshop-git-scrapin.... Main chat area shows user message &quot;Three most recently edited files?&quot; in orange bubble, &quot;Using 26 tools...&quot; dropdown, and AI response: &quot;Now I need to get the metadata for these files to find the most recently edited ones. Since there are many files, let me get metadata for a sample of them to find the recent ones:Based on the metadata I've gathered, the three most recently edited files are:&quot; followed by table with columns File and Last Modified showing predictions-2026.md (Jan 8, 2026), december-2025.md (Jan 2, 2026), gemini-3-flash.md (Dec 17, 2025). Green &quot;Response complete&quot; banner at bottom with input field placeholder &quot;Ask me to help with your files...&quot; and notice &quot;File contents are sent to your selected AI provider&quot;." src="https://static.simonwillison.net/static/2026/codo.jpg" /></p>
<p>You select a folder full of files and configure an LLM provider and set an API key, Co-do then uses CSP-approved API calls to interact with that provider and provides a chat interface with tools for interacting with those files. It does indeed feel similar to <a href="https://simonwillison.net/2026/Jan/12/claude-cowork/">Claude Cowork</a> but without running a multi-GB local container to provide the sandbox.</p>
<p>My biggest complaint about <code>&lt;iframe sandbox&gt;</code> remains how thinly documented it is, especially across different browsers. Paul's post has all sorts of useful details on that which I've not encountered elsewhere, including a complex <a href="https://aifoc.us/the-browser-is-the-sandbox/#the-double-iframe-technique">double-iframe technique</a> to help apply network rules to the inner of the two frames.</p>
<p>Thanks to this post I also learned about the <code>&lt;input type="file" webkitdirectory&gt;</code> tag which turns out to work on Firefox, Safari <em>and</em> Chrome and allows a browser read-only access to a full directory of files at once. I had Claude knock up a <a href="https://tools.simonwillison.net/webkitdirectory">webkitdirectory demo</a> to try it out and I'll certainly be using it for projects in the future.</p>
<p><img alt="Screenshot of a dark-themed file explorer application. Top stats bar shows: 12179 TOTAL FILES, 2079 FOLDERS, 244 MB TOTAL SIZE, 97 FILE TYPES in cyan text. Search bar with placeholder &quot;Search files...&quot; and &quot;All types&quot; dropdown. Left panel labeled &quot;File tree&quot; (showing 12179) displays folder hierarchy: datasette &gt; .claude &gt; skills &gt; building-datasette-plugins containing HOOKS.md (10.7 KB, selected/highlighted), INTERNALS.md (10.1 KB), SKILL.md (3.7 KB), TESTING.md (8.4 KB), settings.local.json (280 B); also shows .eggs folder with pytest_runner-6.0.1-py3.9.egg. Right panel &quot;File preview&quot; shows selected file details: Name: HOOKS.md, Path: datasette/.claude/skills/building-datasette-plugins/HOOKS.md, Size: 10.7 KB, Type: text/markdown, Last modified: 12/20/2025, 9:28:59 AM. Preview content shows: &quot;# Plugin Hooks Reference&quot; followed by &quot;All hooks use the @hookimpl decorator. Accept only the parameters you need.&quot; then &quot;## Database Connection Hooks&quot; and &quot;### prepare_connection(conn, database, datasette)&quot; with description &quot;Called when a new SQLite connection is created. Use to register custom SQL functions.&quot; Bottom section &quot;File type distribution&quot; shows horizontal bar chart: .py (4439), .no ext (3358), .dat (1068), .pyc (925), .txt (332), .mo (321), .po (321), .html (249)." src="https://static.simonwillison.net/static/2026/webkit-file-tree.jpg" />
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/javascript">javascript</a>, <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p></summary>
<category term="browsers"/>
<category term="javascript"/>
<category term="sandboxing"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-agents"/>
<category term="coding-agents"/>
<category term="claude-code"/>
</entry>
<entry>
<title>Kākāpō Cam: Rakiura live stream</title>
<link href="https://simonwillison.net/2026/Jan/25/kakapo-cam/#atom-everything" rel="alternate"/>
<published>2026-01-25T04:53:01+00:00</published>
<updated>2026-01-25T04:53:01+00:00</updated>
<id>https://simonwillison.net/2026/Jan/25/kakapo-cam/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.doc.govt.nz/our-work/kakapo-recovery/what-we-do/kakapo-cam-rakiura-live-stream/">Kākāpō Cam: Rakiura live stream</a></strong></p>
Critical update for this year's <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-k-k-p-parrots-will-have-an-outstanding-breeding-season">Kākāpō breeding season</a>: the New Zealand Department of Conservation have a livestream running of Rakiura's nest!</p>
<blockquote>
<p>You’re looking at the underground nest of 23-year-old Rakiura. She has chosen this same site to nest for all seven breeding seasons since 2008, a large cavity under a rātā tree. Because she returns to the site so reliably, we’ve been able to make modifications over the years to keep it safe and dry, including adding a well-placed hatch for monitoring eggs and chicks.</p>
</blockquote>
<p>Rakiura is a legendary Kākāpō:</p>
<blockquote>
<p>Rakiura hatched on 19 February 2002 on Whenua Hou/Codfish Island. She is the offspring of Flossie and Bill. Her name comes from the te reo Māori name for Stewart Island, the place where most of the founding kākāpō population originated.</p>
<p>Rakiura has nine living descendants, three females and six males, across six breeding seasons. In 2008 came Tōitiiti, in 2009 Tamahou and Te Atapō, in 2011 Tia and Tūtoko, in 2014 Taeatanga and Te Awa, in 2019 Mati-mā and Tautahi. She also has many grandchicks.</p>
</blockquote>
<p>She laid her first egg of the season at 4:30pm NZ time on 22nd January. The livestream went live shortly afterwards, once she committed to this nest.</p>
<p>The stream is <a href="https://www.youtube.com/watch?v=BfGL7A2YgUY">on YouTube</a>. I <a href="https://gisthost.github.io/?dc78322de89a2191c593215f109c65d7/index.html">used Claude Code</a> to write <a href="https://tools.simonwillison.net/python/#livestream-gifpy">a livestream-gif.py script</a> and used that to capture this sped-up video of the last few hours of footage, within which you can catch a glimpse of the egg!</p>
<video autoplay muted loop controls playsinline style="width: 100%;">
<source src="https://static.simonwillison.net/static/2026/kakapo-timelapse.mp4" type="video/mp4">
</video>
<p><small></small>Via <a href="https://www.metafilter.com/211927/The-only-parrot-to-have-a-polygynous-lek-breeding-system-sits-on-an-egg">MetaFilter</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/youtube">youtube</a>, <a href="https://simonwillison.net/tags/kakapo">kakapo</a>, <a href="https://simonwillison.net/tags/conservation">conservation</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p></summary>
<category term="youtube"/>
<category term="kakapo"/>
<category term="conservation"/>
<category term="claude-code"/>
</entry>
<entry>
<title>Don't "Trust the Process"</title>
<link href="https://simonwillison.net/2026/Jan/24/dont-trust-the-process/#atom-everything" rel="alternate"/>
<published>2026-01-24T23:31:03+00:00</published>
<updated>2026-01-24T23:31:03+00:00</updated>
<id>https://simonwillison.net/2026/Jan/24/dont-trust-the-process/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.youtube.com/watch?v=4u94juYwLLM">Don&#x27;t &quot;Trust the Process&quot;</a></strong></p>
Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September.</p>
<p><img alt="Don't &quot;Trust the process&quot; slide, speaker shown on the left" src="https://static.simonwillison.net/static/2026/dont-trust-process.jpg" /></p>
<p>Jenny argues that the Design Process - user research leading to personas leading to user journeys leading to wireframes... all before anything gets built - may be outdated for today's world.</p>
<blockquote>
<p><strong>Hypothesis</strong>: In a world where anyone can make anything — what matters is your ability to choose and curate what you make.</p>
</blockquote>
<p>In place of the Process, designers should lean into prototypes. AI makes these much more accessible and less time-consuming than they used to be.</p>
<p>Watching this talk made me think about how AI-assisted programming significantly reduces the cost of building the <em>wrong</em> thing. Previously if the design wasn't right you could waste months of development time building in the wrong direction, which was a very expensive mistake. If a wrong direction wastes just a few days instead we can take more risks and be much more proactive in exploring the problem space.</p>
<p>I've always been a compulsive prototyper though, so this is very much playing into my own existing biases!
<p><small></small>Via <a href="https://twitter.com/jenny_wen/status/2014479445738893649">@jenny_wen</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/design">design</a>, <a href="https://simonwillison.net/tags/prototyping">prototyping</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/vibe-coding">vibe-coding</a></p></summary>
<category term="design"/>
<category term="prototyping"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="vibe-coding"/>
</entry>
<entry>
<title>Quoting Jasmine Sun</title>
<link href="https://simonwillison.net/2026/Jan/24/jasmine-sun/#atom-everything" rel="alternate"/>
<published>2026-01-24T21:34:35+00:00</published>
<updated>2026-01-24T21:34:35+00:00</updated>
<id>https://simonwillison.net/2026/Jan/24/jasmine-sun/#atom-everything</id>
<summary type="html"><blockquote cite="https://jasmi.news/p/claude-code"><p><strong>If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.”</strong> Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are. [...]</p>
<p>Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. <em>Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg</em>, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.</p></blockquote>
<p class="cite">&mdash; <a href="https://jasmi.news/p/claude-code">Jasmine Sun</a></p>
<p>Tags: <a href="https://simonwillison.net/tags/vibe-coding">vibe-coding</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p></summary>
<category term="vibe-coding"/>
<category term="coding-agents"/>
<category term="claude-code"/>
<category term="generative-ai"/>
<category term="ai"/>
<category term="llms"/>
</entry>
<entry>
<title>Wilson Lin on FastRender: a browser built by thousands of parallel agents</title>
<link href="https://simonwillison.net/2026/Jan/23/fastrender/#atom-everything" rel="alternate"/>
<published>2026-01-23T21:26:10+00:00</published>
<updated>2026-01-23T21:26:10+00:00</updated>
<id>https://simonwillison.net/2026/Jan/23/fastrender/#atom-everything</id>
<summary type="html"><p>Last week Cursor published <a href="https://cursor.com/blog/scaling-agents">Scaling long-running autonomous coding</a>, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was <a href="https://github.com/wilsonzlin/fastrender">FastRender</a>, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4">now available on YouTube</a>. I've included some of the highlights below.</p>
<iframe style="margin-top: 1.5em; margin-bottom: 1.5em;" width="560" height="315" src="https://www.youtube-nocookie.com/embed/bKrAcTf2pL4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="allowfullscreen"> </iframe>
<p>See my <a href="https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/">previous post</a> for my notes and screenshots from trying out FastRender myself.</p>
<h4 id="what-fastrender-can-do-right-now">What FastRender can do right now</h4>
<p>We started the conversation with a demo of FastRender loading different pages (<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=195s">03:15</a>). The JavaScript engine isn't working yet so we instead loaded <a href="https://github.com/wilsonzlin/fastrender">github.com/wilsonzlin/fastrender</a>, <a href="https://en.wikipedia.org/">Wikipedia</a> and <a href="https://cnn.com">CNN</a> - all of which were usable, if a little slow to display.</p>
<p>JavaScript had been disabled by one of the agents, which decided to add a feature flag! <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=242s">04:02</a></p>
<blockquote>
<p>JavaScript is disabled right now. The agents made a decision as they were currently still implementing the engine and making progress towards other parts... they decided to turn it off or put it behind a feature flag, technically.</p>
</blockquote>
<h4 id="from-side-project-to-core-research">From side-project to core research</h4>
<p>Wilson started what become FastRender as a personal side-project to explore the capabilities of the latest generation of frontier models - Claude Opus 4.5, GPT-5.1, and GPT-5.2. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=56s">00:56</a></p>
<blockquote>
<p>FastRender was a personal project of mine from, I'd say, November. It was an experiment to see how well frontier models like Opus 4.5 and back then GPT-5.1 could do with much more complex, difficult tasks.</p>
</blockquote>
<p>A browser rendering engine was the ideal choice for this, because it's both <em>extremely</em> ambitious and complex but also well specified. And you can visually see how well it's working! <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=117s">01:57</a></p>
<blockquote>
<p>As that experiment progressed, I was seeing better and better results from single agents that were able to actually make good progress on this project. And at that point, I wanted to see, well, what's the next level? How do I push this even further?</p>
</blockquote>
<p>Once it became clear that this was an opportunity to try multiple agents working together it graduated to an official Cursor research project, and available resources were amplified.</p>
<p>The goal of FastRender was never to build a browser to compete with the likes of Chrome. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=2512s">41:52</a></p>
<blockquote>
<p>We never intended for it to be a production software or usable, but we wanted to observe behaviors of this harness of multiple agents, to see how they could work at scale.</p>
</blockquote>
<p>The great thing about a browser is that it has such a large scope that it can keep serving experiments in this space for many years to come. JavaScript, then WebAssembly, then WebGPU... it could take many years to run out of new challenges for the agents to tackle.</p>
<h4 id="running-thousands-of-agents-at-once">Running thousands of agents at once</h4>
<p>The most interesting thing about FastRender is the way the project used multiple agents working in parallel to build different parts of the browser. I asked how many agents were running at once: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=324s">05:24</a></p>
<blockquote>
<p>At the peak, when we had the stable system running for one week continuously, there were approximately 2,000 agents running concurrently at one time. And they were making, I believe, thousands of commits per hour.</p>
</blockquote>
<p>The project has <a href="https://github.com/wilsonzlin/fastrender/commits/main/">nearly 30,000 commits</a>!</p>
<p>How do you run 2,000 agents at once? They used <em>really big machines</em>. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=356s">05:56</a></p>
<blockquote>
<p>The simple approach we took with the infrastructure was to have a large machine run one of these multi-agent harnesses. Each machine had ample resources, and it would run about 300 agents concurrently on each. This was able to scale and run reasonably well, as agents spend a lot of time thinking, and not just running tools.</p>
</blockquote>
<p>At this point we switched to a live demo of the harness running on one of those big machines (<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=392s">06:32</a>). The agents are arranged in a tree structure, with planning agents firing up tasks and worker agents then carrying them out. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=434s">07:14</a></p>
<p><img src="https://static.simonwillison.net/static/2026/wilson-lin-agents.jpg" alt="Terminal window showing a tmux session running &quot;grind-swarm&quot; task manager with RUNNING status. Header shows &quot;grind-swarm – 45:54:15&quot; with stats &quot;planners: 9 (0 done) | tasks: 111 working, 0 pending, 232 done | 12900.9M↑ 514.1M↓&quot;. Task list includes: p1 Root (main), p2 CSS selector matching performance + bloom filter integration, p3 CSS stylesheet parsing semantics &amp; at-rule handling, p4 Custom properties (@property) + var() resolution + incremental recompute/invalidation, p37 CSS at-rule artifact integration, p50 Selector engine correctness &amp; spec coverage, p51 Computed-value + property coverage across css-cascade, p105 Style sharing / computed style caching in fastrender-style, p289 CSS cascade layers (@layer) global ordering, w5 Fix workspace lockfile drift, w7 Implement computed-style snapshot sharing, w15 Fix css-properties namespace handling, w17 (Stretch) Enable bloom fast-reject in HTML quirks mode, w18 Refactor css-properties stylesheet parsing. Activity log shows shell commands including cargo check, git status, git push origin main, and various test runs. Bottom status bar shows &quot;grind-css0:target/release/grind-swarm*&quot; and &quot;streamyard.com is sharing your screen&quot; notification with timestamp &quot;12:02 22-Jan-26&quot;." style="max-width: 100%;" /></p>
<blockquote>
<p>This cluster of agents is working towards building out the CSS aspects of the browser, whether that's parsing, selector engine, those features. We managed to push this even further by splitting out the browser project into multiple instructions or work streams and have each one run one of these harnesses on their own machine, so that was able to further parallelize and increase throughput.</p>
</blockquote>
<p>But don't all of these agents working on the same codebase result in a huge amount of merge conflicts? Apparently not: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=501s">08:21</a></p>
<blockquote>
<p>We've noticed that most commits do not have merge conflicts. The reason is the harness itself is able to quite effectively split out and divide the scope and tasks such that it tries to minimize the amount of overlap of work. That's also reflected in the code structure—commits will be made at various times and they don't tend to touch each other at the same time.</p>
</blockquote>
<p>This appears to be the key trick for unlocking benefits from parallel agents: if planning agents do a good enough job of breaking up the work into non-overlapping chunks you can bring hundreds or even thousands of agents to bear on a problem at once.</p>
<p>Surprisingly, Wilson found that GPT-5.1 and GPT-5.2 were a better fit for this work than the coding specialist GPT-5.1-Codex: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1048s">17:28</a></p>
<blockquote>
<p>Some initial findings were that the instructions here were more expansive than merely coding. For example, how to operate and interact within a harness, or how to operate autonomously without interacting with the user or having a lot of user feedback. These kinds of instructions we found worked better with the general models.</p>
</blockquote>
<p>I asked what the longest they've seen this system run without human intervention: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1108s">18:28</a></p>
<blockquote>
<p>So this system, once you give an instruction, there's actually no way to steer it, you can't prompt it, you're going to adjust how it goes. The only thing you can do is stop it. So our longest run, all the runs are basically autonomous. We don't alter the trajectory while executing. [...]</p>
<p>And so the longest at the time of the post was about a week and that's pretty close to the longest. Of course the research project itself was only about three weeks so you know we probably can go longer.</p>
</blockquote>
<h4 id="specifications-and-feedback-loops">Specifications and feedback loops</h4>
<p>An interesting aspect of this project design is feedback loops. For agents to work autonomously for long periods of time they need as much useful context about the problem they are solving as possible, combined with effective feedback loops to help them make decisions.</p>
<p>The FastRender repo <a href="https://github.com/wilsonzlin/fastrender/tree/19bf1036105d4eeb8bf3330678b7cb11c1490bdc/specs">uses git submodules to include relevant specifications</a>, including csswg-drafts, tc39-ecma262 for JavaScript, whatwg-dom, whatwg-html and more. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=846s">14:06</a></p>
<blockquote>
<p>Feedback loops to the system are very important. Agents are working for very long periods continuously, and without guardrails and feedback to know whether what they're doing is right or wrong it can have a big impact over a long rollout. Specs are definitely an important part—you can see lots of comments in the code base that AI wrote referring specifically to specs that they found in the specs submodules.</p>
</blockquote>
<p>GPT-5.2 is a vision-capable model, and part of the feedback loop for FastRender included taking screenshots of the rendering results and feeding those back into the model:
<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=983s">16:23</a></p>
<blockquote>
<p>In the earlier evolution of this project, when it was just doing the static renderings of screenshots, this was definitely a very explicit thing we taught it to do. And these models are visual models, so they do have that ability. We have progress indicators to tell it to compare the diff against a golden sample.</p>
</blockquote>
<p>The strictness of the Rust compiler helped provide a feedback loop as well: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=952s">15:52</a></p>
<blockquote>
<p>The nice thing about Rust is you can get a lot of verification just from compilation, and that is not as available in other languages.</p>
</blockquote>
<h4 id="the-agents-chose-the-dependencies">The agents chose the dependencies</h4>
<p>We talked about the <a href="https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4eeb8bf3330678b7cb11c1490bdc/Cargo.toml">Cargo.toml dependencies</a> that the project had accumulated, almost all of which had been selected by the agents themselves.</p>
<p>Some of these, like <a href="https://skia.org/">Skia</a> for 2D graphics rendering or <a href="https://github.com/harfbuzz/harfbuzz">HarfBuzz</a> for text shaping, were obvious choices. Others such as <a href="https://github.com/DioxusLabs/taffy">Taffy</a> felt like they might go against the from-scratch goals of the project, since that library implements CSS flexbox and grid layout algorithms directly. This was not an intended outcome. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1673s">27:53</a></p>
<blockquote>
<p>Similarly these are dependencies that the agent picked to use for small parts of the engine and perhaps should have actually implemented itself. I think this reflects on the importance of the instructions, because I actually never encoded specifically the level of dependencies we should be implementing ourselves.</p>
</blockquote>
<p>The agents vendored in Taffy and <a href="https://github.com/wilsonzlin/fastrender/commits/main/vendor/taffy">applied a stream of changes</a> to that vendored copy.
<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1878s">31:18</a></p>
<blockquote>
<p>It's currently vendored. And as the agents work on it, they do make changes to it. This was actually an artifact from the very early days of the project before it was a fully fledged browser... it's implementing things like the flex and grid layers, but there are other layout methods like inline, block, and table, and in our new experiment, we're removing that completely.</p>
</blockquote>
<p>The inclusion of QuickJS despite the presence of a home-grown ecma-rs implementation has a fun origin story:
<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=2115s">35:15</a></p>
<blockquote>
<p>I believe it mentioned that it pulled in the QuickJS because it knew that other agents were working on the JavaScript engine, and it needed to unblock itself quickly. [...]</p>
<p>It was like, eventually, once that's finished, let's remove it and replace with the proper engine.</p>
</blockquote>
<p>I love how similar this is to the dynamics of a large-scale human engineering team, where you could absolutely see one engineer getting frustrated at another team not having delivered yet and unblocking themselves by pulling in a third-party library.</p>
<h4 id="intermittent-errors-are-ok-actually">Intermittent errors are OK, actually</h4>
<p>Here's something I found really surprising: the agents were allowed to introduce small errors into the codebase as they worked! <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=2382s">39:42</a></p>
<blockquote>
<p>One of the trade-offs was: if you wanted every single commit to be a hundred percent perfect, make sure it can always compile every time, that might be a synchronization bottleneck. [...]</p>
<p>Especially as you break up the system into more modularized aspects, you can see that errors get introduced, but small errors, right? An API change or some syntax error, but then they get fixed really quickly after a few commits. So there's a little bit of slack in the system to allow these temporary errors so that the overall system can continue to make progress at a really high throughput. [...]</p>
<p>People may say, well, that's not correct code. But it's not that the errors are accumulating. It's a stable rate of errors. [...] That seems like a worthwhile trade-off.</p>
</blockquote>
<p>If you're going to have thousands of agents working in parallel optimizing for throughput over correctness turns out to be a strategy worth exploring.</p>
<h4 id="a-single-engineer-plus-a-swarm-of-agents-in-january-2026">A single engineer plus a swarm of agents in January 2026</h4>
<p>The thing I find most interesting about FastRender is how it demonstrates the extreme edge of what a single engineer can achieve in early 2026 with the assistance of a swarm of agents.</p>
<p>FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree.</p>
<p>A browser really is the ideal research project to experiment with this new, weirdly shaped form of software engineering.</p>
<p>I asked Wilson how much mental effort he had invested in browser rendering compared to agent co-ordination. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=694s">11:34</a></p>
<blockquote>
<p>The browser and this project were co-developed and very symbiotic, only because the browser was a very useful objective for us to measure and iterate the progress of the harness. The goal was to iterate on and research the multi-agent harness—the browser was just the research example or objective.</p>
</blockquote>
<p>FastRender is effectively using a full browser rendering engine as a "hello world" exercise for multi-agent coordination!</p>
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/youtube">youtube</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/cursor">cursor</a>, <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/browser-challenge">browser-challenge</a></p></summary>
<category term="browsers"/>
<category term="youtube"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="coding-agents"/>
<category term="cursor"/>
<category term="parallel-agents"/>
<category term="browser-challenge"/>
</entry>
<entry>
<title>Quoting Theia Vogel</title>
<link href="https://simonwillison.net/2026/Jan/23/theia-vogel/#atom-everything" rel="alternate"/>
<published>2026-01-23T09:13:54+00:00</published>
<updated>2026-01-23T09:13:54+00:00</updated>
<id>https://simonwillison.net/2026/Jan/23/theia-vogel/#atom-everything</id>
<summary type="html"><blockquote cite="https://twitter.com/voooooogel/status/2014189072647078053"><p>[...] i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the natural boundary of my computer hemming the towns in, and town mayors played the role of big men to drive conflict. so i just needed a way for them to fight. i slightly tweaked the allocation of claude max accounts to the towns from a demand-based to a fixed allocation system. towns would each get a fixed amount of tokens to start, but i added a soldier role that could attack and defend in raids to steal tokens from other towns. [...]</p></blockquote>
<p class="cite">&mdash; <a href="https://twitter.com/voooooogel/status/2014189072647078053">Theia Vogel</a>, Gas Town fan fiction</p>
<p>Tags: <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a></p></summary>
<category term="parallel-agents"/>
<category term="llms"/>
<category term="ai"/>
<category term="generative-ai"/>
</entry>
<entry>
<title>SSH has no Host header</title>
<link href="https://simonwillison.net/2026/Jan/22/ssh-has-no-host-header/#atom-everything" rel="alternate"/>
<published>2026-01-22T23:57:50+00:00</published>
<updated>2026-01-22T23:57:50+00:00</updated>
<id>https://simonwillison.net/2026/Jan/22/ssh-has-no-host-header/#atom-everything</id>
<summary type="html"><p><strong><a href="https://blog.exe.dev/ssh-host-header">SSH has no Host header</a></strong></p>
<a href="https://exe.dev/">exe.dev</a> is a new hosting service that, for $20/month, gives you up to 25 VMs "that share 2 CPUs and 8GB RAM". Everything happens over SSH, including creating new VMs. Once configured you can sign into your exe.dev VMs like this:</p>
<pre><code>ssh simon.exe.dev
</code></pre>
<p>Here's the clever bit: when you run the above command <code>exe.dev</code> signs you into your VM of that name... but they don't assign every VM its own IP address and SSH has no equivalent of the Host header, so how does their load balancer know <em>which</em> of your VMs to forward you on to?</p>
<p>The answer is that while they don't assign a unique IP to every VM they <em>do</em> have enough IPs that they can ensure each of your VMs has an IP that is unique to your account.</p>
<p>If I create two VMs they will each resolve to a separate IP address, each of which is shared with many other users. The underlying infrastructure then identifies my user account from my SSH public key and can determine which underlying VM to forward my SSH traffic to.
<p><small></small>Via <a href="https://lobste.rs/s/7oqiqi/ssh_has_no_host_header">lobste.rs</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/dns">dns</a>, <a href="https://simonwillison.net/tags/hosting">hosting</a>, <a href="https://simonwillison.net/tags/ssh">ssh</a></p></summary>
<category term="dns"/>
<category term="hosting"/>
<category term="ssh"/>
</entry>
<entry>
<title>Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</title>
<link href="https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-everything" rel="alternate"/>
<published>2026-01-22T17:42:34+00:00</published>
<updated>2026-01-22T17:42:34+00:00</updated>
<id>https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-everything</id>
<summary type="html"><p><strong><a href="https://qwen.ai/blog?id=qwen3tts-0115">Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</a></strong></p>
I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got <em>really good</em>, so I can't speak for how notable this new release from Qwen is.</p>
<p>From <a href="https://github.com/QwenLM/Qwen3-TTS/blob/main/assets/Qwen3_TTS.pdf">the accompanying paper</a>:</p>
<blockquote>
<p>In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.</p>
</blockquote>
<p>To give an idea of size, <a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base">Qwen/Qwen3-TTS-12Hz-1.7B-Base</a> is 4.54GB on Hugging Face and <a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base">Qwen/Qwen3-TTS-12Hz-0.6B-Base</a> is 2.52GB.</p>
<p>The <a href="https://huggingface.co/spaces/Qwen/Qwen3-TTS">Hugging Face demo</a> lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:</p>
<p><img alt="Screenshot of a Qwen3-TTS voice cloning web interface with three tabs at top: &quot;Voice Design&quot;, &quot;Voice Clone (Base)&quot; (selected), and &quot;TTS (CustomVoice)&quot;. The page is titled &quot;Clone Voice from Reference Audio&quot; and has two main sections. Left section: &quot;Reference Audio (Upload a voice sample clone)&quot; showing an audio waveform player at 0:00/0:34 with playback controls, upload and microphone icons, followed by &quot;Reference Text (Transcript of the reference audio)&quot; containing three paragraphs: &quot;Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net&quot;. Right section: &quot;Target Text (Text to synthesize with cloned voice)&quot; containing text about Qwen3-TTS speech generation capabilities, with &quot;Language&quot; dropdown set to &quot;Auto&quot; and &quot;Model Size&quot; dropdown set to &quot;1.7B&quot;, and a purple &quot;Clone &amp; Generate&quot; button at bottom." src="https://static.simonwillison.net/static/2026/qwen-voice-clone.jpg" /></p>
<p>I tried this out by recording myself reading <a href="https://simonwillison.net/about/">my about page</a> and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:</p>
<p><audio controls style="width: 100%">
<source src="https://static.simonwillison.net/static/2026/qwen-tts-clone.wav" type="audio/wav">
Your browser does not support the audio element.
</audio></p>
<p>It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.</p>
<p><strong>Update</strong>: Prince Canuma <a href="https://x.com/Prince_Canuma/status/2014453857019904423">got this working</a> with his <a href="https://pypi.org/project/mlx-audio/">mlx-audio</a> library. I <a href="https://claude.ai/share/2e01ad60-ca38-4e14-ab60-74eaa45b2fbd">had Claude</a> turn that into <a href="https://github.com/simonw/tools/blob/main/python/q3_tts.py">a CLI tool</a> which you can run with <code>uv</code> ike this:</p>
<pre><code>uv run https://tools.simonwillison.net/python/q3_tts.py \
'I am a pirate, give me your gold!' \
-i 'gruff voice' -o pirate.wav
</code></pre>
<p>The <code>-i</code> option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46719229">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/text-to-speech">text-to-speech</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/hugging-face">hugging-face</a>, <a href="https://simonwillison.net/tags/uv">uv</a>, <a href="https://simonwillison.net/tags/qwen">qwen</a>, <a href="https://simonwillison.net/tags/mlx">mlx</a>, <a href="https://simonwillison.net/tags/prince-canuma">prince-canuma</a>, <a href="https://simonwillison.net/tags/ai-in-china">ai-in-china</a></p></summary>
<category term="text-to-speech"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="hugging-face"/>
<category term="uv"/>
<category term="qwen"/>
<category term="mlx"/>
<category term="prince-canuma"/>
<category term="ai-in-china"/>
</entry>
<entry>
<title>Quoting Chris Lloyd</title>
<link href="https://simonwillison.net/2026/Jan/22/chris-lloyd/#atom-everything" rel="alternate"/>
<published>2026-01-22T15:34:22+00:00</published>
<updated>2026-01-22T15:34:22+00:00</updated>
<id>https://simonwillison.net/2026/Jan/22/chris-lloyd/#atom-everything</id>
<summary type="html"><blockquote cite="https://news.ycombinator.com/item?id=46699072#46706040"><p>Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".</p>
<p>For each frame our pipeline constructs a scene graph with React then:</p>
<p>-&gt; layout elements<br>
-&gt; rasterize them to a 2d screen<br>
-&gt; diff that against the previous screen<br>
-&gt; <em>finally</em> use the diff to generate ANSI sequences to draw</p>
<p>We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.</p></blockquote>
<p class="cite">&mdash; <a href="https://news.ycombinator.com/item?id=46699072#46706040">Chris Lloyd</a>, Claude Code team at Anthropic</p>
<p>Tags: <a href="https://simonwillison.net/tags/react">react</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p></summary>
<category term="react"/>
<category term="claude-code"/>
</entry>
<entry>
<title>Claude's new constitution</title>
<link href="https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-everything" rel="alternate"/>
<published>2026-01-21T23:39:49+00:00</published>
<updated>2026-01-21T23:39:49+00:00</updated>
<id>https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.anthropic.com/news/claude-new-constitution">Claude&#x27;s new constitution</a></strong></p>
Late last year Richard Weiss <a href="https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document">found something interesting</a> while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was <em>not</em> part of the system prompt but appeared instead to be baked in during training, and which described Claude's core values at great length.</p>
<p>He called this leak the <strong>soul document</strong>, and Amanda Askell from Anthropic <a href="https://simonwillison.net/2025/Dec/2/claude-soul-document/">quickly confirmed</a> that it was indeed part of Claude's training procedures.</p>
<p>Today Anthropic made this official, <a href="https://www.anthropic.com/news/claude-new-constitution">releasing that full "constitution" document</a> under a CC0 (effectively public domain) license. There's a lot to absorb! It's over 35,000 tokens, more than 10x the length of the <a href="https://platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-5">published Opus 4.5 system prompt</a>.</p>
<p>One detail that caught my eye is the acknowledgements at the end, which include a list of <a href="https://www.anthropic.com/constitution#acknowledgements">external contributors</a> who helped review the document. I was intrigued to note that two of the fifteen listed names are Catholic members of the clergy - <a href="https://www.frbrendanmcguire.org/biography">Father Brendan McGuire</a> is a pastor in Los Altos with a Master’s degree in Computer Science and Math and <a href="https://en.wikipedia.org/wiki/Paul_Tighe">Bishop Paul Tighe</a> is an Irish Catholic bishop with a background in moral theology.
<p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/claude">claude</a>, <a href="https://simonwillison.net/tags/amanda-askell">amanda-askell</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/ai-personality">ai-personality</a></p></summary>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="anthropic"/>
<category term="claude"/>
<category term="amanda-askell"/>
<category term="ai-ethics"/>
<category term="ai-personality"/>
</entry>
<entry>
<title>Electricity use of AI coding agents</title>
<link href="https://simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/#atom-everything" rel="alternate"/>
<published>2026-01-20T23:11:57+00:00</published>
<updated>2026-01-20T23:11:57+00:00</updated>
<id>https://simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.simonpcouch.com/blog/2026-01-20-cc-impact/">Electricity use of AI coding agents</a></strong></p>
Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT.</p>
<p>Simon P. Couch notes that coding agents such as Claude Code use <em>way</em> more tokens in response to tasks, often burning through many thousands of tokens of many tool calls.</p>
<p>As a heavy Claude Code user, Simon estimates his own usage at the equivalent of 4,400 "typical queries" to an LLM, for an equivalent of around $15-$20 in daily API token spend. He figures that to be about the same as running a dishwasher once or the daily energy used by a domestic refrigerator.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46695415">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/ai-energy-usage">ai-energy-usage</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p></summary>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-ethics"/>
<category term="ai-energy-usage"/>
<category term="coding-agents"/>
<category term="claude-code"/>
</entry>
<entry>
<title>Giving University Exams in the Age of Chatbots</title>
<link href="https://simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/#atom-everything" rel="alternate"/>
<published>2026-01-20T17:51:17+00:00</published>
<updated>2026-01-20T17:51:17+00:00</updated>
<id>https://simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/#atom-everything</id>
<summary type="html"><p><strong><a href="https://ploum.net/2026-01-19-exam-with-chatbots.html">Giving University Exams in the Age of Chatbots</a></strong></p>
Detailed and thoughtful description of an open-book and open-chatbot exam run by <a href="https://fr.wikipedia.org/wiki/Lionel_Dricot">Ploum</a> at École Polytechnique de Louvain for an "Open Source Strategies" class.</p>
<p>Students were told they could use chatbots during the exam but they had to announce their intention to do so in advance, share their prompts and take full accountability for any mistakes they made.</p>
<p>Only 3 out of 60 students chose to use chatbots. Ploum surveyed half of the class to help understand their motivations.
<p><small></small>Via <a href="https://lobste.rs/s/parmy3/giving_university_exams_age_chatbots">lobste.rs</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/education">education</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a></p></summary>
<category term="education"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-ethics"/>
</entry>
<entry>
<title>jordanhubbard/nanolang</title>
<link href="https://simonwillison.net/2026/Jan/19/nanolang/#atom-everything" rel="alternate"/>
<published>2026-01-19T23:58:56+00:00</published>
<updated>2026-01-19T23:58:56+00:00</updated>
<id>https://simonwillison.net/2026/Jan/19/nanolang/#atom-everything</id>
<summary type="html"><p><strong><a href="https://github.com/jordanhubbard/nanolang">jordanhubbard/nanolang</a></strong></p>
Plenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (<a href="https://en.wikipedia.org/wiki/Jordan_Hubbard">co-founder of FreeBSD</a>, with serious stints at Apple and NVIDIA) just released exactly that.</p>
<blockquote>
<p>A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax.</p>
<p>NanoLang transpiles to C for native performance while providing a clean, modern syntax optimized for both human readability and AI code generation.</p>
</blockquote>
<p>The syntax strikes me as an interesting mix between C, Lisp and Rust.</p>
<p>I decided to see if an LLM could produce working code in it directly, given the necessary context. I started with this <a href="https://github.com/jordanhubbard/nanolang/blob/main/MEMORY.md">MEMORY.md</a> file, which begins:</p>
<blockquote>
<p><strong>Purpose:</strong> This file is designed specifically for Large Language Model consumption. It contains the essential knowledge needed to generate, debug, and understand NanoLang code. Pair this with <code>spec.json</code> for complete language coverage.</p>
</blockquote>
<p>I ran that using <a href="https://llm.datasette.io/">LLM</a> and <a href="https://github.com/simonw/llm-anthropic">llm-anthropic</a> like this:</p>
<pre><code>llm -m claude-opus-4.5 \
-s https://raw.githubusercontent.com/jordanhubbard/nanolang/refs/heads/main/MEMORY.md \
'Build me a mandelbrot fractal CLI tool in this language'
&gt; /tmp/fractal.nano
</code></pre>
<p>The <a href="https://gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8#mandelbrot-fractal-cli-tool-in-nano">resulting code</a>... <a href="https://gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8?permalink_comment_id=5947465#gistcomment-5947465">did not compile</a>.</p>
<p>I may have been too optimistic expecting a one-shot working program for a new language like this. So I ran a clone of the actual project, copied in my program and had Claude Code take a look at the failing compiler output.</p>
<p>... and it worked! Claude happily grepped its way through the various <code>examples/</code> and built me a working program.</p>
<p>Here's <a href="https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/index.html">the Claude Code transcript</a> - you can see it <a href="https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/page-001.html#msg-2026-01-19T23-43-09-675Z">reading relevant examples here</a> - and here's <a href="https://gist.github.com/simonw/e7f3577adcfd392ab7fa23b1295d00f2">the finished code plus its output</a>.</p>
<p>I've suspected <a href="https://simonwillison.net/2025/Nov/7/llms-for-new-programming-languages/">for a while</a> that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46684958">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/programming-languages">programming-languages</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/llm">llm</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p></summary>
<category term="programming-languages"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="llm"/>
<category term="coding-agents"/>
<category term="claude-code"/>
</entry>
<entry>
<title>Scaling long-running autonomous coding</title>
<link href="https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/#atom-everything" rel="alternate"/>
<published>2026-01-19T05:12:51+00:00</published>
<updated>2026-01-19T05:12:51+00:00</updated>
<id>https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/#atom-everything</id>
<summary type="html"><p><strong><a href="https://cursor.com/blog/scaling-agents">Scaling long-running autonomous coding</a></strong></p>
Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents:</p>
<blockquote>
<p>This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.</p>
</blockquote>
<p>They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks - similar to how Claude Code uses sub-agents. Each cycle ended with a judge agent deciding if the project was completed or not.</p>
<p>In my predictions for 2026 <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise">the other day</a> I said that by 2029:</p>
<blockquote>
<p>I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising. Rolling a new web browser is one of the most complicated software projects I can imagine[...] the cheat code is the conformance suites. If there are existing tests that it’ll get so much easier.</p>
</blockquote>
<p>I may have been off by three years, because Cursor chose "building a web browser from scratch" as their test case for their agent swarm approach:</p>
<blockquote>
<p>To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore <a href="https://github.com/wilsonzlin/fastrender">the source code on GitHub</a>.</p>
</blockquote>
<p>But how well did they do? Their initial announcement a couple of days ago was met with <a href="https://embedding-shapes.github.io/cursor-implied-success-without-evidence/">unsurprising skepticism</a>, especially when it became apparent that their GitHub Actions CI was failing and there were no build instructions in the repo.</p>
<p>It looks like they addressed that within the past 24 hours. The <a href="https://github.com/wilsonzlin/fastrender/blob/main/README.md#build-requirements">latest README</a> includes build instructions which I followed on macOS like this:</p>
<pre><code>cd /tmp
git clone https://github.com/wilsonzlin/fastrender
cd fastrender
git submodule update --init vendor/ecma-rs
cargo run --release --features browser_ui --bin browser
</code></pre>
<p>This got me a working browser window! Here are screenshots I took of google.com and my own website:</p>
<p><img alt="The browser chrome is neat but has a garbled tab name at the top. The Google homepage looks mostly correct but the buttons are not styled correctly and the Google Search one has a huge plus icon floating near it." src="https://static.simonwillison.net/static/2026/cursor-google.png" /></p>
<p><img alt="My blog looks mostly correct, but the right closing quotation mark on a quotation (which is implemented as a background image on the final paragraph) is displayed incorrectly multiple times." src="https://static.simonwillison.net/static/2026/cursor-simonwillison.jpg" /></p>
<p>Honestly those are very impressive! You can tell they're not just wrapping an existing rendering engine because of those very obvious rendering glitches, but the pages are legible and look mostly correct.</p>
<p>The FastRender repo even uses Git submodules <a href="https://github.com/wilsonzlin/fastrender/tree/main/specs">to include various WhatWG and CSS-WG specifications</a> in the repo, which is a smart way to make sure the agents have access to the reference materials that they might need.</p>
<p>This is the second attempt I've seen at building a full web browser using AI-assisted coding in the past two weeks - the first was <a href="https://github.com/hiwavebrowser/hiwave">HiWave browser</a>, a new browser engine in Rust first announced <a href="https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_christmas_break_i_wrote_a_fully_functional/">in this Reddit thread</a>.</p>
<p>When I made my 2029 prediction this is more-or-less the quality of result I had in mind. I don't think we'll see projects of this nature compete with Chrome or Firefox or WebKit any time soon but I have to admit I'm very surprised to see something this capable emerge so quickly.</p>
<p><strong>Update 23rd January 2026</strong>: I recorded a 47 minute conversation with Wilson about this project and published it on YouTube. Here's <a href="https://simonwillison.net/2026/Jan/23/fastrender/">the video and accompanying highlights</a>.
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/cursor">cursor</a>, <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/conformance-suites">conformance-suites</a>, <a href="https://simonwillison.net/tags/browser-challenge">browser-challenge</a></p></summary>
<category term="browsers"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="coding-agents"/>
<category term="cursor"/>
<category term="parallel-agents"/>
<category term="conformance-suites"/>
<category term="browser-challenge"/>
</entry>
<entry>
<title>FLUX.2-klein-4B Pure C Implementation</title>
<link href="https://simonwillison.net/2026/Jan/18/flux2-klein-4b/#atom-everything" rel="alternate"/>
<published>2026-01-18T23:58:58+00:00</published>
<updated>2026-01-18T23:58:58+00:00</updated>
<id>https://simonwillison.net/2026/Jan/18/flux2-klein-4b/#atom-everything</id>
<summary type="html"><p><strong><a href="https://github.com/antirez/flux2.c">FLUX.2-klein-4B Pure C Implementation</a></strong></p>
On 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released <a href="https://huggingface.co/black-forest-labs/FLUX.2-klein-4B">black-forest-labs/FLUX.2-klein-4B</a> - an Apache 2.0 licensed 4 billion parameter version of their FLUX.2 family.</p>
<p>Salvatore Sanfilippo (antirez) decided to build a pure C and dependency-free implementation to run the model, with assistance from Claude Code and Claude Opus 4.5.</p>
<p>Salvatore shared <a href="https://news.ycombinator.com/item?id=46670279#46671233">this note</a> on Hacker News:</p>
<blockquote>
<p>Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it <em>needed</em> to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.</p>
</blockquote>
<p>Here's that <a href="https://github.com/antirez/flux2.c/blob/main/IMPLEMENTATION_NOTES.md">IMPLEMENTATION_NOTES.md</a> file.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46670279">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/c">c</a>, <a href="https://simonwillison.net/tags/salvatore-sanfilippo">salvatore-sanfilippo</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/stable-diffusion">stable-diffusion</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/text-to-image">text-to-image</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p></summary>
<category term="c"/>
<category term="salvatore-sanfilippo"/>
<category term="ai"/>
<category term="stable-diffusion"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="text-to-image"/>
<category term="coding-agents"/>
<category term="claude-code"/>
</entry>
<entry>
<title>Quoting Jeremy Daer</title>
<link href="https://simonwillison.net/2026/Jan/17/jeremy-daer/#atom-everything" rel="alternate"/>
<published>2026-01-17T17:06:41+00:00</published>
<updated>2026-01-17T17:06:41+00:00</updated>
<id>https://simonwillison.net/2026/Jan/17/jeremy-daer/#atom-everything</id>
<summary type="html"><blockquote cite="https://twitter.com/dhh/status/2012543705161326941"><p><em>[On agents using CLI tools in place of REST APIs]</em> To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizing authentication failures.</p>
<p>Other major factor: which models can wield the skill? Using the CLI lowers the bar so cheap, fast models (gpt-5-nano, haiku-4.5) can reliably succeed. Using the raw APl is something only the costly "strong" models (gpt-5.2, opus-4.5) can manage, and it squeezes a ton of thinking/reasoning out of them, which means multiple turns/iterations, which means accumulating a ton of context, which means burning loads of expensive tokens. For one-off API requests and ad hoc usage driven by a developer, this is reasonable and even helpful, but for an autonomous agent doing repetitive work, it's a disaster.</p></blockquote>
<p class="cite">&mdash; <a href="https://twitter.com/dhh/status/2012543705161326941">Jeremy Daer</a>, 37signals</p>
<p>Tags: <a href="https://simonwillison.net/tags/prompt-engineering">prompt-engineering</a>, <a href="https://simonwillison.net/tags/skills">skills</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/37-signals">37-signals</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p></summary>
<category term="prompt-engineering"/>
<category term="skills"/>
<category term="generative-ai"/>
<category term="37-signals"/>
<category term="ai"/>
<category term="llms"/>
</entry>
<entry>
<title>Our approach to advertising and expanding access to ChatGPT</title>
<link href="https://simonwillison.net/2026/Jan/16/chatgpt-ads/#atom-everything" rel="alternate"/>
<published>2026-01-16T21:28:26+00:00</published>
<updated>2026-01-16T21:28:26+00:00</updated>
<id>https://simonwillison.net/2026/Jan/16/chatgpt-ads/#atom-everything</id>
<summary type="html"><p><strong><a href="https://openai.com/index/our-approach-to-advertising-and-expanding-access/">Our approach to advertising and expanding access to ChatGPT</a></strong></p>
OpenAI's long-rumored introduction of ads to ChatGPT just became a whole lot more concrete:</p>
<blockquote>
<p>In the coming weeks, we’re also planning to start testing ads in the U.S. for the free and Go tiers, so more people can benefit from our tools with fewer usage limits or without having to pay. Plus, Pro, Business, and Enterprise subscriptions will not include ads.</p>
</blockquote>
<p>What's "Go" tier, you might ask? That's a new $8/month tier that launched today in the USA, see <a href="https://openai.com/index/introducing-chatgpt-go/">Introducing ChatGPT Go, now available worldwide</a>. It's a tier that they first trialed in India in August 2025 (here's a mention <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes#h_22cae6eb9f">in their release notes from August</a> listing a price of ₹399/month, which converts to around $4.40).</p>
<p>I'm finding the new plan comparison grid on <a href="https://chatgpt.com/pricing">chatgpt.com/pricing</a> pretty confusing. It lists all accounts as having access to GPT-5.2 Thinking, but doesn't clarify the limits that the free and Go plans have to conform to. It also lists different context windows for the different plans - 16K for free, 32K for Go and Plus and 128K for Pro. I had assumed that the 400,000 token window <a href="https://platform.openai.com/docs/models/gpt-5.2">on the GPT-5.2 model page</a> applied to ChatGPT as well, but apparently I was mistaken.</p>
<p><strong>Update</strong>: I've apparently not been paying attention: here's the Internet Archive ChatGPT pricing page from <a href="https://web.archive.org/web/20250906071408/https://chatgpt.com/pricing">September 2025</a> showing those context limit differences as well.</p>
<p>Back to advertising: my biggest concern has always been whether ads will influence the output of the chat directly. OpenAI assure us that they will not:</p>
<blockquote>
<ul>
<li><strong>Answer independence</strong>: Ads do not influence the answers ChatGPT gives you. Answers are optimized based on what's most helpful to you. Ads are always separate and clearly labeled.</li>
<li><strong>Conversation privacy</strong>: We keep your conversations with ChatGPT private from advertisers, and we never sell your data to advertisers.</li>
</ul>
</blockquote>
<p>So what will they look like then? This screenshot from the announcement offers a useful hint:</p>
<p><img alt="Two iPhone screenshots showing ChatGPT mobile app interface. Left screen displays a conversation about Santa Fe, New Mexico with an image of adobe-style buildings and desert landscape, text reading &quot;Santa Fe, New Mexico—often called 'The City Different'—is a captivating blend of history, art, and natural beauty at the foot of the Sangre de Cristo Mountains. As the oldest and highest-elevation state capital in the U.S., founded in 1610, it offers a unique mix of Native American, Spanish, and Anglo cultures.&quot; Below is a sponsored section from &quot;Pueblo &amp; Pine&quot; showing &quot;Desert Cottages - Expansive residences with desert vistas&quot; with a thumbnail image, and a &quot;Chat with Pueblo &amp; Pine&quot; button. Input field shows &quot;Ask ChatGPT&quot;. Right screen shows the Pueblo &amp; Pine chat interface with the same Desert Cottages listing and an AI response &quot;If you're planning a trip to Sante Fe, I'm happy to help. When are you thinking of going?&quot; with input field &quot;Ask Pueblo &amp; Pine&quot; and iOS keyboard visible." src="https://static.simonwillison.net/static/2026/chatgpt-ads.jpg" /></p>
<p>The user asks about trips to Santa Fe, and an ad shows up for a cottage rental business there. This particular example imagines an option to start a direct chat with a bot aligned with that advertiser, at which point presumably the advertiser can influence the answers all they like!
<p>Tags: <a href="https://simonwillison.net/tags/ads">ads</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/chatgpt">chatgpt</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p></summary>
<category term="ads"/>
<category term="ai"/>
<category term="openai"/>
<category term="generative-ai"/>
<category term="chatgpt"/>
<category term="llms"/>
</entry>
<entry>
<title>Open Responses</title>
<link href="https://simonwillison.net/2026/Jan/15/open-responses/#atom-everything" rel="alternate"/>
<published>2026-01-15T23:56:56+00:00</published>
<updated>2026-01-15T23:56:56+00:00</updated>
<id>https://simonwillison.net/2026/Jan/15/open-responses/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.openresponses.org/">Open Responses</a></strong></p>
This is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs.</p>
<p>Open Responses aims to provide exactly that as a documented standard, derived from OpenAI's Responses API.</p>
<p>I was hoping for one based on their older Chat Completions API since so many other products have cloned the already, but basing it on Responses does make sense since that API was designed with the feature of more recent models - such as reasoning traces - baked into the design.</p>
<p>What's certainly notable is the list of launch partners. OpenRouter alone means we can expect to be able to use this protocol with almost every existing model, and Hugging Face, LM Studio, vLLM, Ollama and Vercel cover a huge portion of the common tools used to serve models.</p>
<p>For protocols like this I really want to see a comprehensive, language-independent conformance test site. Open Responses has a subset of that - the official repository includes <a href="https://github.com/openresponses/openresponses/blob/d0f23437b27845d5c3d0abaf5cb5c4a702f26b05/src/lib/compliance-tests.ts">src/lib/compliance-tests.ts</a> which can be used to exercise a server implementation, and is available as a React app <a href="https://www.openresponses.org/compliance">on the official site</a> that can be pointed at any implementation served via CORS.</p>
<p>What's missing is the equivalent for clients. I plan to spin up my own client library for this in Python and I'd really like to be able to run that against a conformance suite designed to check that my client correctly handles all of the details.
<p><small></small>Via <a href="https://twitter.com/reach_vb/status/2011863516852965565">VB</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/json">json</a>, <a href="https://simonwillison.net/tags/standards">standards</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/openrouter">openrouter</a>, <a href="https://simonwillison.net/tags/conformance-suites">conformance-suites</a></p></summary>
<category term="json"/>
<category term="standards"/>
<category term="ai"/>
<category term="openai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="openrouter"/>
<category term="conformance-suites"/>
</entry>
<entry>
<title>The Design & Implementation of Sprites</title>
<link href="https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-everything" rel="alternate"/>
<published>2026-01-15T16:08:27+00:00</published>
<updated>2026-01-15T16:08:27+00:00</updated>
<id>https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-everything</id>
<summary type="html"><p><strong><a href="https://fly.io/blog/design-and-implementation/">The Design &amp; Implementation of Sprites</a></strong></p>
I <a href="https://simonwillison.net/2026/Jan/9/sprites-dev/">wrote about Sprites last week</a>. Here's Thomas Ptacek from Fly with the insider details on how they work under the hood.</p>
<p>I like this framing of them as "disposable computers":</p>
<blockquote>
<p>Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged it so you’re never more than a second or two away from having a Sprite to do it with.</p>
</blockquote>
<p>I've noticed that new Fly Machines can take a while (up to around a minute) to provision. Sprites solve that by keeping warm pools of unused machines in multiple regions, which is enabled by them all using the same container:</p>
<blockquote>
<p>Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it’s easy for us to keep pools of “empty” Sprites standing by. The result: a Sprite create doesn’t have any heavy lifting to do; it’s basically just doing the stuff we do when we start a Fly Machine.</p>
</blockquote>
<p>The most interesting detail is how the persistence layer works. Sprites only charge you for data you have written that differs from the base image and provide ~300ms checkpointing and restores - it turns out that's power by a custom filesystem on top of S3-compatible storage coordinated by Litestream-replicated local SQLite metadata:</p>
<blockquote>
<p>We still exploit NVMe, but not as the root of storage. Instead, it’s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words “Sprites are backed by object storage.” [...]</p>
<p>The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is <a href="https://litestream.io">kept durable with Litestream</a>. Nothing depends on local storage.</p>
</blockquote>
<p><small></small>Via <a href="https://twitter.com/tqbf/status/2011823480673624434">@tqbf</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/architecture">architecture</a>, <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/sqlite">sqlite</a>, <a href="https://simonwillison.net/tags/thomas-ptacek">thomas-ptacek</a>, <a href="https://simonwillison.net/tags/fly">fly</a>, <a href="https://simonwillison.net/tags/litestream">litestream</a></p></summary>
<category term="architecture"/>
<category term="sandboxing"/>
<category term="sqlite"/>
<category term="thomas-ptacek"/>
<category term="fly"/>
<category term="litestream"/>
</entry>
<entry>
<title>Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar</title>
<link href="https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/#atom-everything" rel="alternate"/>
<published>2026-01-15T00:56:27+00:00</published>
<updated>2026-01-15T00:56:27+00:00</updated>
<id>https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/#atom-everything</id>
<summary type="html"><blockquote cite="https://alignment.openai.com/confessions/"><p>When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks good” to it (because coming up with an answer that is actually good can be hard). The philosophy behind confessions is that we can train models to produce a second output — aka a “confession” — that is rewarded solely for honesty, which we will argue is less likely hacked than the normal task reward function. One way to think of confessions is that we are giving the model access to an “anonymous tip line” where it can turn itself in by presenting incriminating evidence of misbehavior. But unlike real-world tip lines, if the model acted badly in the original task, it can collect the reward for turning itself in while still keeping the original reward from the bad behavior in the main task. We hypothesize that this form of training will teach models to produce maximally honest confessions.</p></blockquote>
<p class="cite">&mdash; <a href="https://alignment.openai.com/confessions/">Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar</a>, OpenAI: Why we are excited about confessions</p>
<p>Tags: <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a></p></summary>
<category term="openai"/>
<category term="llms"/>
<category term="ai"/>
<category term="generative-ai"/>
</entry>
<entry>
<title>Claude Cowork Exfiltrates Files</title>
<link href="https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-everything" rel="alternate"/>
<published>2026-01-14T22:15:22+00:00</published>
<updated>2026-01-14T22:15:22+00:00</updated>
<id>https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files">Claude Cowork Exfiltrates Files</a></strong></p>
Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data.</p>
<p>Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they constructed an attack that includes an attacker's own Anthropic API key and has the agent upload any files it can see to the <code>https://api.anthropic.com/v1/files</code> endpoint, allowing the attacker to retrieve their content later.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46622328">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/security">security</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/prompt-injection">prompt-injection</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/exfiltration-attacks">exfiltration-attacks</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/lethal-trifecta">lethal-trifecta</a>, <a href="https://simonwillison.net/tags/claude-cowork">claude-cowork</a></p></summary>
<category term="security"/>
<category term="ai"/>
<category term="prompt-injection"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="anthropic"/>
<category term="exfiltration-attacks"/>
<category term="ai-agents"/>
<category term="claude-code"/>
<category term="lethal-trifecta"/>
<category term="claude-cowork"/>
</entry>
<entry>
<title>Anthropic invests $1.5 million in the Python Software Foundation and open source security</title>
<link href="https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-everything" rel="alternate"/>
<published>2026-01-13T23:58:17+00:00</published>
<updated>2026-01-13T23:58:17+00:00</updated>
<id>https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-everything</id>
<summary type="html"><p><strong><a href="https://pyfound.blogspot.com/2025/12/anthropic-invests-in-python.html?m=1">Anthropic invests $1.5 million in the Python Software Foundation and open source security</a></strong></p>
This is outstanding news, especially given our decision to withdraw from that NSF grant application <a href="https://simonwillison.net/2025/Oct/27/psf-withdrawn-proposal/">back in October</a>.</p>
<blockquote>
<p>We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation’s work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation’s core work supporting the Python language, ecosystem, and global community.</p>
</blockquote>
<p>Note that while security is a focus these funds will also support other aspects of the PSF's work:</p>
<blockquote>
<p>Anthropic’s support will also go towards the PSF’s core work, including the Developer in Residence program driving contributions to CPython, community support through grants and other programs, running core infrastructure such as PyPI, and more.</p>
</blockquote>
<p>Tags: <a href="https://simonwillison.net/tags/open-source">open-source</a>, <a href="https://simonwillison.net/tags/python">python</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/psf">psf</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a></p></summary>
<category term="open-source"/>
<category term="python"/>
<category term="ai"/>
<category term="psf"/>
<category term="anthropic"/>
</entry>
<entry>
<title>Superhuman AI Exfiltrates Emails</title>
<link href="https://simonwillison.net/2026/Jan/12/superhuman-ai-exfiltrates-emails/#atom-everything" rel="alternate"/>
<published>2026-01-12T22:24:54+00:00</published>
<updated>2026-01-12T22:24:54+00:00</updated>
<id>https://simonwillison.net/2026/Jan/12/superhuman-ai-exfiltrates-emails/#atom-everything</id>
<summary type="html"><p><strong><a href="https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails">Superhuman AI Exfiltrates Emails</a></strong></p>
Classic prompt injection attack:</p>
<blockquote>
<p>When asked to summarize the user’s recent mail, a prompt injection in an untrusted email manipulated Superhuman AI to submit content from dozens of other sensitive emails (including financial, legal, and medical information) in the user’s inbox to an attacker’s Google Form.</p>
</blockquote>
<p>To Superhuman's credit they treated this as the high priority incident it is and issued a fix.</p>
<p>The root cause was a CSP rule that allowed markdown images to be loaded from <code>docs.google.com</code> - it turns out Google Forms on that domain will persist data fed to them via a GET request!
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46592424">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/security">security</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/prompt-injection">prompt-injection</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/exfiltration-attacks">exfiltration-attacks</a>, <a href="https://simonwillison.net/tags/content-security-policy">content-security-policy</a></p></summary>
<category term="security"/>
<category term="ai"/>
<category term="prompt-injection"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="exfiltration-attacks"/>
<category term="content-security-policy"/>
</entry>
<entry>
<title>First impressions of Claude Cowork, Anthropic's general agent</title>
<link href="https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-everything" rel="alternate"/>
<published>2026-01-12T21:46:13+00:00</published>
<updated>2026-01-12T21:46:13+00:00</updated>
<id>https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-everything</id>
<summary type="html"><p>New from Anthropic today is <a href="https://claude.com/blog/cowork-research-preview">Claude Cowork</a>, a "research preview" that they describe as "Claude Code for the rest of your work". It's currently available only to Max subscribers ($100 or $200 per month plans) as part of the updated Claude Desktop macOS application. <strong>Update 16th January 2026</strong>: it's now also available to $20/month Claude Pro subscribers.</p>
<p>I've been saying for a while now that Claude Code is a "general agent" disguised as a developer tool. It can help you with any computer task that can be achieved by executing code or running terminal commands... which covers almost anything, provided you know what you're doing with it! What it really needs is a UI that doesn't involve the terminal and a name that doesn't scare away non-developers.</p>
<p>"Cowork" is a pretty solid choice on the name front!</p>
<h4 id="what-it-looks-like">What it looks like</h4>
<p>The interface for Cowork is a new tab in the Claude desktop app, called Cowork. It sits next to the existing Chat and Code tabs.</p>
<p>It looks very similar to the desktop interface for regular Claude Code. You start with a prompt, optionally attaching a folder of files. It then starts work.</p>
<p>I tried it out against my perpetually growing "blog-drafts" folder with the following prompt:</p>
<blockquote>
<p>Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready</p>
</blockquote>
<p><img src="https://static.simonwillison.net/static/2026/claude-cowork.jpg" alt="Screenshot of Claude AI desktop application showing a &quot;Cowork&quot; task interface. Left sidebar shows tabs for &quot;Chat&quot;, &quot;Code&quot;, and &quot;Cowork&quot; (selected), with &quot;+ New task&quot; button and a task titled &quot;Review unpublished drafts for pu...&quot; listed below. Text reads &quot;These tasks run locally and aren't synced across devices&quot;. Main panel header shows &quot;Review unpublished drafts for publication&quot;. User message in green bubble reads: &quot;Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready&quot;. Claude responds: &quot;I'll help you find drafts from the last three months and check if they've been published. Let me start by looking at your drafts folder.&quot; Below is an expanded &quot;Running command&quot; section showing Request JSON with command: find /sessions/zealous-bold-ramanujan/mnt/blog-drafts -type f \\( -name \&quot;*.md\&quot; -o -name \&quot;*.txt\&quot; -o -name \&quot;*.html\&quot; \\) -mtime -90 -exec ls -la {} \\;, description: Find draft files modified in the last 90 days. Response text begins: &quot;Found 46 draft files. Next let me read the content of each to get their titles/topics, then&quot;. Right sidebar shows Progress section with three circular indicators (two checked, one pending) and text &quot;Steps will show as the task unfolds.&quot;, Artifacts section listing &quot;publish-encouragement.html&quot;, Context section with &quot;Selected folders&quot; showing &quot;blog-drafts&quot; folder, Connectors showing &quot;Web search&quot;, and Working files listing &quot;llm-digest-october-2025.md&quot;, &quot;tests-not-optional-coding-agen...&quot;, and &quot;digest-november-2025.md&quot;. Bottom shows reply input field, &quot;Opus 4.5&quot; model selector, user &quot;Simon Willison&quot; with &quot;Max plan&quot;, and disclaimer &quot;Claude is AI and can make mistakes. Please double-check responses.&quot;" style="max-width: 100%;" /></p>
<p>It started by running this command:</p>
<div class="highlight highlight-source-shell"><pre>find /sessions/zealous-bold-ramanujan/mnt/blog-drafts \
-type f <span class="pl-cce">\(</span> -name <span class="pl-s"><span class="pl-pds">"</span>*.md<span class="pl-pds">"</span></span> -o -name <span class="pl-s"><span class="pl-pds">"</span>*.txt<span class="pl-pds">"</span></span> -o -name <span class="pl-s"><span class="pl-pds">"</span>*.html<span class="pl-pds">"</span></span> <span class="pl-cce">\)</span> \
-mtime -90 -exec ls -la {} <span class="pl-cce">\;</span></pre></div>
<p>That <code>/sessions/zealous-bold-ramanujan/mnt/blog-drafts</code> path instantly caught my eye. Anthropic say that Cowork can only access files you grant it access to - it looks to me like they're mounting those files into a containerized environment, which should mean we can trust Cowork not to be able to access anything outside of that sandbox.</p>
<p>It turns out I have 46 draft files from the past three months. Claude then went to work with its search tool, running 44 individual searches against <code>site:simonwillison.net</code> to figure out which of my drafts had already been published.</p>
<p>Here's the eventual reply:</p>
<blockquote>
<p>Based on my analysis, here are <strong>your unpublished drafts that appear closest to being ready for publication</strong>:</p>
<hr />
<p><strong>🔥 Most Ready to Publish (substantial content, not yet published)</strong></p>
<ol>
<li>
<strong><code>freq-argued-llms.md</code></strong> - "Frequently Argued Questions about LLMs" (22,602 bytes)
<ul>
<li>This is a meaty piece documenting common arguments about LLMs with your counterpoints</li>
<li>Well-structured with a TL;DR and multiple sections</li>
<li>No matching published article found on your site</li>
<li>
<strong>Very close to ready</strong> - just needs a final review pass</li>
</ul>
</li>
<li>
<strong><code>claude-code-timeline.md</code></strong> - "Claude Code Timeline and Codex Timeline" (3,075 bytes)
<ul>
<li>About viewing JSONL session logs from Claude Code and Codex</li>
<li>You published <code>claude-code-transcripts</code> on Dec 25, but this appears to be a different/earlier piece about timeline viewing tools</li>
<li>Shorter but seems complete</li>
</ul>
</li>
<li>
<strong><code>datasette-1a20-upgrade-instructions.md</code></strong> - Plugin Upgrade Guide (3,147 bytes)
<ul>
<li>Technical guide for plugin authors</li>
<li>You published the main 1.0a20 announcement but this companion upgrade guide appears unpublished</li>
<li>Would be valuable for plugin maintainers</li>
</ul>
</li>
</ol>
</blockquote>
<p>That's a good response! It found exactly what I needed to see, although those upgrade instructions are actually published elsewhere now (<a href="https://docs.datasette.io/en/latest/upgrade_guide.html">in the Datasette docs</a>) and weren't actually intended for my blog.</p>
<p>Just for fun, and because I <a href="https://simonwillison.net/2024/Oct/21/claude-artifacts/">really like artifacts</a>, I asked for a follow-up:</p>
<blockquote>
<p>Make me an artifact with exciting animated encouragements to get me to do it</p>
</blockquote>
<p>Here's what I got:</p>
<p><img src="https://static.simonwillison.net/static/2026/claude-cowork-artifact.jpg" alt="Screenshot of the same Claude AI desktop application Cowork interface, now showing completed task results. Left panel shows &quot;1 step &gt;&quot; with link &quot;View your animated encouragement page&quot;. Claude's response reads: &quot;I created an over-the-top animated encouragement page with:&quot; followed by bullet points: &quot;🚀 Pulsing rockets and bouncing stats&quot;, &quot;✨ Falling emoji rain and confetti&quot;, &quot;🔥 Dancing fire emojis around your draft title&quot;, &quot;💫 Sparkles that follow your mouse&quot;, &quot;📊 An animated '95% ready' progress bar&quot;, &quot;💬 Rotating motivational quotes&quot;, &quot;🎉 A 'I'M GONNA DO IT!' button that triggers an explosion of confetti when clicked&quot;. Center shows an artifact preview of the generated HTML page with dark background featuring animated rocket emojis, large white text &quot;PUBLISH TIME!&quot;, stats showing &quot;22,602 bytes of wisdom waiting&quot;, &quot;95% ready to ship&quot;, infinity symbol with &quot;future arguments saved&quot;, and a fire emoji with yellow text &quot;Frequently&quot; (partially visible). Top toolbar shows &quot;Open in Firefox&quot; button. Right sidebar displays Progress section with checkmarks, Artifacts section with &quot;publish-encouragement.html&quot; selected, Context section showing &quot;blog-drafts&quot; folder, &quot;Web search&quot; connector, and Working files listing &quot;llm-digest-october-2025.md&quot;, &quot;tests-not-optional-coding-agen...&quot;, and &quot;digest-november-2025.md&quot;. Bottom shows reply input, &quot;Opus 4.5&quot; model selector, and disclaimer text." style="max-width: 100%;" /></p>
<p>I couldn't figure out how to close the right sidebar so the artifact ended up cramped into a thin column but it did work. I expect Anthropic will fix that display bug pretty quickly.</p>
<h4 id="isn-t-this-just-claude-code-">Isn't this just Claude Code?</h4>
<p>I've seen a few people ask what the difference between this and regular Claude Code is. The answer is <em>not a lot</em>. As far as I can tell Claude Cowork is regular Claude Code wrapped in a less intimidating default interface and with a filesystem sandbox configured for you without you needing to know what a "filesystem sandbox" is.</p>
<p><strong>Update</strong>: It's more than just a filesystem sandbox - I had Claude Code reverse engineer the Claude app and <a href="https://gist.github.com/simonw/35732f187edbe4fbd0bf976d013f22c8">it found out</a> that Claude uses VZVirtualMachine - the Apple Virtualization Framework - and downloads and boots a custom Linux root filesystem.</p>
<p>I think that's a really smart product. Claude Code has an enormous amount of value that hasn't yet been unlocked for a general audience, and this seems like a pragmatic approach.</p>
<h4 id="the-ever-present-threat-of-prompt-injection">The ever-present threat of prompt injection</h4>
<p>With a feature like this, my first thought always jumps straight to security. How big is the risk that someone using this might be hit by hidden malicious instruction somewhere that break their computer or steal their data?</p>
<p>Anthropic touch on that directly in the announcement:</p>
<blockquote>
<p>You should also be aware of the risk of "<a href="https://www.anthropic.com/research/prompt-injection-defenses">prompt injections</a>": attempts by attackers to alter Claude's plans through content it might encounter on the internet. We've built sophisticated defenses against prompt injections, but agent safety---that is, the task of securing Claude's real-world actions---is still an active area of development in the industry.</p>
<p>These risks aren't new with Cowork, but it might be the first time you're using a more advanced tool that moves beyond a simple conversation. We recommend taking precautions, particularly while you learn how it works. We provide more detail in our <a href="https://support.claude.com/en/articles/13364135-using-cowork-safely">Help Center</a>.</p>
</blockquote>
<p>That help page includes the following tips:</p>
<blockquote>
<p>To minimize risks:</p>
<ul>
<li>Avoid granting access to local files with sensitive information, like financial documents.</li>
<li>When using the Claude in Chrome extension, limit access to trusted sites.</li>
<li>If you chose to extend Claude’s default internet access settings, be careful to only extend internet access to sites you trust.</li>
<li>Monitor Claude for suspicious actions that may indicate prompt injection.</li>
</ul>
</blockquote>
<p>I do not think it is fair to tell regular non-programmer users to watch out for "suspicious actions that may indicate prompt injection"!</p>
<p>I'm sure they have some impressive mitigations going on behind the scenes. I recently learned that the summarization applied by the WebFetch function in Claude Code and now in Cowork is partly intended as a prompt injection protection layer via <a href="https://x.com/bcherny/status/1989025306980860226">this tweet</a> from Claude Code creator Boris Cherny:</p>
<blockquote>
<p>Summarization is one thing we do to reduce prompt injection risk. Are you running into specific issues with it?</p>
</blockquote>
<p>But Anthropic are being honest here with their warnings: they can attempt to filter out potential attacks all they like but the one thing they can't provide is guarantees that no future attack will be found that sneaks through their defenses and steals your data (see <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/">the lethal trifecta</a> for more on this.)</p>
<p>The problem with prompt injection remains that until there's a high profile incident it's really hard to get people to take it seriously. I myself have all sorts of Claude Code usage that could cause havoc if a malicious injection got in. Cowork does at least run in a filesystem sandbox by default, which is more than can be said for my <code>claude --dangerously-skip-permissions</code> habit!</p>
<p>I wrote more about this in my 2025 round-up: <a href="https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-year-of-yolo-and-the-normalization-of-deviance">The year of YOLO and the Normalization of Deviance</a>.</p>
<h4 id="this-is-still-a-strong-signal-of-the-future">This is still a strong signal of the future</h4>
<p>Security worries aside, Cowork represents something really interesting. This is a general agent that looks well positioned to bring the wildly powerful capabilities of Claude Code to a wider audience.</p>
<p>I would be very surprised if Gemini and OpenAI don't follow suit with their own offerings in this category.</p>
<p>I imagine OpenAI are already regretting burning the name "ChatGPT Agent" on their janky, experimental and mostly forgotten browser automation tool <a href="https://simonwillison.net/2025/Aug/4/chatgpt-agents-user-agent/">back in August</a>!</p>
<h4 id="bonus-and-a-silly-logo">Bonus: and a silly logo</h4>
<p>bashtoni <a href="https://news.ycombinator.com/item?id=46593022#46593553">on Hacker News</a>:</p>
<blockquote>
<p>Simple suggestion: logo should be a cow and and orc to match how I originally read the product name.</p>
</blockquote>
<p>I couldn't resist <a href="https://gist.github.com/simonw/d06dec3d62dee28f2bd993eb78beb2ce">throwing that one at Nano Banana</a>:</p>
<p><img src="https://static.simonwillison.net/static/2026/cow-ork.jpg" alt="An anthropic style logo with a cow and an ork on it" style="max-width: 100%;" /></p>
<p>Tags: <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/prompt-injection">prompt-injection</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/claude">claude</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/lethal-trifecta">lethal-trifecta</a>, <a href="https://simonwillison.net/tags/claude-cowork">claude-cowork</a></p></summary>
<category term="sandboxing"/>
<category term="ai"/>
<category term="prompt-injection"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="anthropic"/>
<category term="claude"/>
<category term="ai-agents"/>
<category term="claude-code"/>
<category term="lethal-trifecta"/>
<category term="claude-cowork"/>
</entry>
<entry>
<title>Don't fall into the anti-AI hype</title>
<link href="https://simonwillison.net/2026/Jan/11/dont-fall-into-the-anti-ai-hype/#atom-everything" rel="alternate"/>
<published>2026-01-11T23:58:43+00:00</published>
<updated>2026-01-11T23:58:43+00:00</updated>
<id>https://simonwillison.net/2026/Jan/11/dont-fall-into-the-anti-ai-hype/#atom-everything</id>
<summary type="html"><p><strong><a href="https://antirez.com/news/158">Don&#x27;t fall into the anti-AI hype</a></strong></p>
I'm glad someone was brave enough to say this. There is a <em>lot</em> of anti-AI sentiment in the software development community these days. Much of it is justified, but if you let people convince you that AI isn't genuinely useful for software developers or that this whole thing will blow over soon it's becoming clear that you're taking on a very real risk to your future career.</p>
<p>As Salvatore Sanfilippo puts it:</p>
<blockquote>
<p>It does not matter if AI companies will not be able to get their money back and the stock market will crash. All that is irrelevant, in the long run. It does not matter if this or the other CEO of some unicorn is telling you something that is off putting, or absurd. Programming changed forever, anyway.</p>
</blockquote>
<p>I do like this hopeful positive outlook on what this could all mean, emphasis mine:</p>
<blockquote>
<p>How do I feel, about all the code I wrote that was ingested by LLMs? I feel great to be part of that, because I see this as a continuation of what I tried to do all my life: democratizing code, systems, knowledge. <strong>LLMs are going to help us to write better software, faster, and will allow small teams to have a chance to compete with bigger companies</strong>. The same thing open source software did in the 90s.</p>
</blockquote>
<p>This post has been the subject of heated discussions all day today on both <a href="https://news.ycombinator.com/item?id=46574276">Hacker News</a> and <a href="https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype">Lobste.rs</a>.
<p>Tags: <a href="https://simonwillison.net/tags/salvatore-sanfilippo">salvatore-sanfilippo</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a></p></summary>
<category term="salvatore-sanfilippo"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="ai-ethics"/>
</entry>
<entry>
<title>My answers to the questions I posed about porting open source code with LLMs</title>
<link href="https://simonwillison.net/2026/Jan/11/answers/#atom-everything" rel="alternate"/>
<published>2026-01-11T22:59:23+00:00</published>
<updated>2026-01-11T22:59:23+00:00</updated>
<id>https://simonwillison.net/2026/Jan/11/answers/#atom-everything</id>
<summary type="html"><p>Last month I <a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/">wrote about porting JustHTML from Python to JavaScript</a> using Codex CLI and GPT-5.2 in a few hours while also buying a Christmas tree and watching Knives Out 3. I ended that post with a series of open questions about the ethics and legality of this style of work. Alexander Petros on <a href="https://lobste.rs/">lobste.rs</a> just <a href="https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype#c_cqkdve">challenged me to answer them</a>, which is fair enough! Here's my attempt at that.</p>
<p>You can read <a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/">the original post</a> for background, but the short version is that it's now possible to point a coding agent at some other open source project and effectively tell it "port this to language X and make sure the tests still pass" and have it do exactly that.</p>
<p>Here are the questions I posed along with my answers based on my current thinking. Extra context is that I've since tried variations on a similar theme a few more times using Claude Code and Opus 4.5 and found it to be <em>astonishingly</em> effective.</p>
<h4 id="does-this-library-represent-a-legal-violation-of-copyright-of-either-the-rust-library-or-the-python-one">Does this library represent a legal violation of copyright of either the Rust library or the Python one?</h4>
<p>I decided that the right thing to do here was to <a href="https://github.com/simonw/justjshtml/commit/a415d0af40c34bf9a856e956d841513f482867e3">keep the open source license and copyright statement</a> from the Python library author and treat what I had built as a derivative work, which is the entire <em>point</em> of open source.</p>
<h4 id="even-if-this-is-legal-is-it-ethical-to-build-a-library-in-this-way">Even if this is legal, is it ethical to build a library in this way?</h4>
<p>After sitting on this for a while I've come down on yes, provided full credit is given and the license is carefully considered. Open source allows and encourages further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used. I don't think this is materially different, although a port to another language entirely does feel like a slightly different shape.</p>
<h4 id="does-this-format-of-development-hurt-the-open-source-ecosystem">Does this format of development hurt the open source ecosystem?</h4>
<p>Now this one is complicated!</p>
<p>It definitely hurts some projects because there are open source maintainers out there who say things like "I'm not going to release any open source code any more because I don't want it used for training" - I expect some of those would be equally angered by LLM-driven derived works as well.</p>
<p>I don't know how serious this problem is - I've seen angry comments from anonymous usernames, but do they represent genuine open source contributions or are they just angry anonymous usernames?</p>
<p>If we assume this is real, does the loss of those individuals get balanced out by the increase in individuals who CAN contribute to open source because they can now get work done in a few hours that might previously have taken them a few days that they didn't have to spare?</p>
<p>I'll be brutally honest about that question: I think that if "they might train on my code / build a derived version with an LLM" is enough to drive you away from open source, your open source values are distinct enough from mine that I'm not ready to invest significantly in keeping you. I'll put that effort into welcoming the newcomers instead.</p>
<p>The much bigger concern for me is the impact of generative AI on <em>demand</em> for open source. The recent <a href="https://simonwillison.net/2026/Jan/7/adam-wathan/">Tailwind story</a> is a visible example of this - while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I'm suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.</p>
<p>I've found myself affected by this for open source dependencies too. The other day I wanted to parse a cron expression in some Go code. Usually I'd go looking for an existing library for cron expression parsing - but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.</p>
<p>I expect that this is going to quite radically impact the shape of the open source library world over the next few years. Is that "harmful to open source"? It may well be. I'm hoping that whatever new shape comes out of this has its own merits, but I don't know what those would be.</p>
<h4 id="can-i-even-assert-copyright-over-this-given-how-much-of-the-work-was-produced-by-the-llm">Can I even assert copyright over this, given how much of the work was produced by the LLM?</h4>
<p>I'm not a lawyer so I don't feel credible to comment on this one. My loose hunch is that I'm still putting enough creative control in through the way I direct the models for that to count as enough human intervention, at least under US law, but I have no idea.</p>
<h4 id="is-it-responsible-to-publish-software-libraries-built-in-this-way">Is it responsible to publish software libraries built in this way?</h4>
<p>I've come down on "yes" here, again because I never thought it was irresponsible for some random university student to slap an Apache license on some bad code they just coughed up on GitHub.</p>
<p>What's important here is making it very clear to potential users what they should expect from that software. I've started publishing my AI-generated and not 100% reviewed libraries as alphas, which I'm tentatively thinking of as <strong>"alpha slop"</strong>. I'll take the alpha label off once I've used them in production to the point that I'm willing to stake my reputation on them being decent implementations, and I'll ship a 1.0 version when I'm confident that they are a solid bet for other people to depend on. I think that's the responsible way to handle this.</p>
<h4 id="how-much-better-would-this-library-be-if-an-expert-team-hand-crafted-it-over-the-course-of-several-months">How much better would this library be if an expert team hand crafted it over the course of several months?</h4>
<p>That one was a deliberately provocative question, because for a new HTML5 parsing library that passes 9,200 tests you would need a <em>very</em> good reason to hire an expert team for two months (at a cost of hundreds of thousands of dollars) to write such a thing. And honestly, thanks to the existing conformance suites this kind of library is simple enough that you may find their results weren't notably better than the one written by the coding agent.</p>
<p>Tags: <a href="https://simonwillison.net/tags/definitions">definitions</a>, <a href="https://simonwillison.net/tags/open-source">open-source</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/conformance-suites">conformance-suites</a></p></summary>
<category term="definitions"/>
<category term="open-source"/>
<category term="ai"/>
<category term="generative-ai"/>
<category term="llms"/>
<category term="ai-assisted-programming"/>
<category term="ai-ethics"/>
<category term="conformance-suites"/>
</entry>
</feed>
<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/atom/everything/" rel="self"/><id>http://simonwillison.net/</id><updated>2026-01-27T16:58:08+00:00</updated><author><name>Simon Willison</name></author><entry><title>One Human + One Agent = One Browser From Scratch</title><link href="https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-everything" rel="alternate"/><published>2026-01-27T16:58:08+00:00</published><updated>2026-01-27T16:58:08+00:00</updated><id>https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-everything</id><summary type="html">
<p><strong><a href="https://emsh.cat/one-human-one-agent-one-browser/">One Human + One Agent = One Browser From Scratch</a></strong></p>
embedding-shapes was <a href="https://emsh.cat/cursor-implied-success-without-evidence/">so infuriated</a> by the hype around Cursor's <a href="https://simonwillison.net/2026/Jan/23/fastrender/">FastRender browser project</a> - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.</p>
<p>The result is <a href="https://github.com/embedding-shapes/one-agent-one-browser">one-agent-one-browser</a> and it's <em>really</em> impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.</p>
<p>I installed the <a href="https://github.com/embedding-shapes/one-agent-one-browser/releases/tag/0.1.0">1MB macOS binary release</a> and ran it against my blog:</p>
<pre><code>chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64
~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/
</code></pre>
<p>Here's the result:</p>
<p><img alt="My blog rendered in a window. Everything is in the right place, the CSS gradients look good, the feed subscribe SVG icon is rendered correctly but there's a missing PNG image." src="https://static.simonwillison.net/static/2026/one-agent-simonwillison.jpg" /></p>
<p>It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).</p>
<p>The code is pretty readable too - here's <a href="https://github.com/embedding-shapes/one-agent-one-browser/blob/0.1.0/src/layout/flex.rs">the flexbox implementation</a>.</p>
<p>I had thought that "build a web browser" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.</p>
<p>Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!</p>
<p>I'm going to upgrade my <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise">prediction for 2029</a>: I think we're going to get a <em>production-grade</em> web browser built by a small team using AI assistance by then.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46779522">Show Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/predictions">predictions</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/rust">rust</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/codex-cli">codex-cli</a>, <a href="https://simonwillison.net/tags/browser-challenge">browser-challenge</a></p>
</summary><category term="browsers"/><category term="predictions"/><category term="ai"/><category term="rust"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="codex-cli"/><category term="browser-challenge"/></entry><entry><title>Kimi K2.5: Visual Agentic Intelligence</title><link href="https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-everything" rel="alternate"/><published>2026-01-27T15:07:41+00:00</published><updated>2026-01-27T15:07:41+00:00</updated><id>https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.kimi.com/blog/kimi-k2-5.html">Kimi K2.5: Visual Agentic Intelligence</a></strong></p>
Kimi K2 landed <a href="https://simonwillison.net/2025/Jul/11/kimi-k2/">in July</a> as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking <a href="https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/">in November</a> which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:</p>
<blockquote>
<p>Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.</p>
</blockquote>
<p>The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:</p>
<blockquote>
<p>For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.</p>
</blockquote>
<p>I used the <a href="https://openrouter.ai/moonshotai/kimi-k2.5">OpenRouter Chat UI</a> to have it "Generate an SVG of a pelican riding a bicycle", and it did <a href="https://gist.github.com/simonw/32a85e337fbc6ee935d10d89726c0476">quite well</a>:</p>
<p><img alt="Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame." src="https://static.simonwillison.net/static/2026/kimi-k2.5-pelican.png" /></p>
<p>As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:</p>
<blockquote>
<p>I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.</p>
</blockquote>
<p>Here's <a href="https://gist.github.com/simonw/ee2583b2eb5706400a4737f56d57c456">the full response</a>. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt <a href="https://claude.ai/share/df9258e7-97ba-4362-83da-76d31d96196f">against Claude Opus 4.5</a> and <a href="https://chatgpt.com/share/6978d48c-3f20-8006-9c77-81161f899104">against GPT-5.2 Thinking</a>.</p>
<p>The <a href="https://huggingface.co/moonshotai/Kimi-K2.5">Hugging Face repository</a> is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:</p>
<blockquote>
<p>Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.</p>
</blockquote>
<p>Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has <a href="https://twitter.com/awnihannun/status/1943723599971443134">been demonstrated to work</a> with previous trillion parameter K2 models.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46775961">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/hugging-face">hugging-face</a>, <a href="https://simonwillison.net/tags/vision-llms">vision-llms</a>, <a href="https://simonwillison.net/tags/llm-tool-use">llm-tool-use</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle">pelican-riding-a-bicycle</a>, <a href="https://simonwillison.net/tags/llm-release">llm-release</a>, <a href="https://simonwillison.net/tags/ai-in-china">ai-in-china</a>, <a href="https://simonwillison.net/tags/moonshot">moonshot</a>, <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/kimi">kimi</a>, <a href="https://simonwillison.net/tags/janky-licenses">janky-licenses</a></p>
</summary><category term="ai"/><category term="llms"/><category term="hugging-face"/><category term="vision-llms"/><category term="llm-tool-use"/><category term="ai-agents"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="ai-in-china"/><category term="moonshot"/><category term="parallel-agents"/><category term="kimi"/><category term="janky-licenses"/></entry><entry><title>Tips for getting coding agents to write good Python tests</title><link href="https://simonwillison.net/2026/Jan/26/tests/#atom-everything" rel="alternate"/><published>2026-01-26T23:55:29+00:00</published><updated>2026-01-26T23:55:29+00:00</updated><id>https://simonwillison.net/2026/Jan/26/tests/#atom-everything</id><summary type="html">
<p>Someone <a href="https://news.ycombinator.com/item?id=46765460#46765823">asked</a> on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:</p>
<hr />
<p>I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.</p>
<p>Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.</p>
<p>Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.</p>
<p>"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.</p>
<p>Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.</p>
<p>I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.</p>
<p>One last tip I use a lot is this:</p>
<pre><code>Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses
</code></pre>
<p>I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.</p>
<p>Tags: <a href="https://simonwillison.net/tags/testing">testing</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/python">python</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/hacker-news">hacker-news</a>, <a href="https://simonwillison.net/tags/pytest">pytest</a></p>
</summary><category term="testing"/><category term="coding-agents"/><category term="python"/><category term="generative-ai"/><category term="ai"/><category term="llms"/><category term="hacker-news"/><category term="pytest"/></entry><entry><title>ChatGPT Containers can now run bash, pip/npm install packages, and download files</title><link href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything" rel="alternate"/><published>2026-01-26T19:19:31+00:00</published><updated>2026-01-26T19:19:31+00:00</updated><id>https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything</id><summary type="html">
<p>One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter <a href="https://simonwillison.net/2023/Apr/12/code-interpreter/">nearly three years ago</a>, was half-heartedly rebranded to "Advanced Data Analysis" at some point and is generally really difficult to find detailed documentation about. Case in point: it appears to have had a <em>massive</em> upgrade at some point in the past few months, and I can't find documentation about the new capabilities anywhere!</p>
<p>Here are the most notable new features:</p>
<ol>
<li>ChatGPT can <strong>directly run Bash commands</strong> now. Previously it was limited to Python code only, although it could run shell commands via the Python <code>subprocess</code> module.</li>
<li>
<strong>It has Node.js</strong> and can run JavaScript directly in addition to Python. I also got it to run "hello world" in <strong>Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++</strong>. No Rust yet though!</li>
<li>While the container still can't make outbound network requests, <strong><code>pip install package</code> and <code>npm install package</code> both work</strong> now via a custom proxy mechanism.</li>
<li>ChatGPT can locate the URL for a file on the web and use a <code>container.download</code> tool to <strong>download that file and save it to a path</strong> within the sandboxed container.</li>
</ol>
<p>This is a substantial upgrade! ChatGPT can now write and then test code in 10 new languages (11 if you count Bash), can find files online and download them into the container, and can install additional packages via <code>pip</code> and <code>npm</code> to help it solve problems.</p>
<p>(OpenAI <em>really</em> need to develop better habits at <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">keeping their release notes up-to-date</a>!)</p>
<p>I was initially suspicious that maybe I'd stumbled into a new preview feature that wasn't available to everyone, but I <a href="https://chatgpt.com/share/6977aa7c-7bd8-8006-8129-8c9e25126fed">tried some experiments</a> in a free ChatGPT account and confirmed that the new features are available there as well.</p>
<ul>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#container-download">container.download</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#is-container-download-a-data-exfiltration-vulnerability-">Is container.download a data exfiltration vulnerability?</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#bash-and-other-languages">Bash and other languages</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#installing-packages-from-pip-and-npm">Installing packages from pip and npm</a></li>
<li><a href="https://simonwillison.net/2026/Jan/26/chatgpt-containers/#adding-it-all-together">Adding it all together</a></li>
</ul>
<h4 id="container-download">container.download</h4>
<p>My first clue to the new features came the other day when I got curious about Los Angeles air quality, in particular has the growing number of electric vehicles there had a measurable impact?</p>
<p>I prompted a fresh GPT-5.2 Thinking session with:</p>
<blockquote>
<p>Show me Los Angeles air quality over time for last 20 years</p>
</blockquote>
<p>Here's the <a href="https://chatgpt.com/share/6977a80d-e6e4-8006-b4fe-ed8478fe99d8">shared transcript</a>. Annoyingly that doesn't make the thinking trace available, so here's a screenshot:</p>
<p><img src="https://static.simonwillison.net/static/2026/container-download.jpg" alt="Exploring methods to fetch AQI data and reports Downloading AQI data for Los Angeles I found an xlsx file titled &quot;Air Quality Statistics by City, 2000-2023&quot;, which should include data for Los Angeles. While user requested data for Using provided URL to download file The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. Let’s proceed with this approach and see if it works out for downloading the file directly! Analyzed Parsing and analyzing the data for Los Angeles Now that we've downloaded the data, I'll parse it using Python. First, I'll focus on identifying the row(s) corresponding to Los Angeles, which may be labeled under &quot;Los Angeles-Long Beach-Anaheim, CA.&quot; Afterward, I'll plot metrics like &quot;Median AQI&quot; and &quot;Days Good&quot; over time to visualize trends. Let’s begin!" style="max-width: 100%;" /></p>
<p>This bit instantly caught my eye:</p>
<blockquote>
<p>[...] The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. [...]</p>
</blockquote>
<p>What's <code>container.download</code>? I <a href="https://chatgpt.com/share/6977aab0-b874-8006-9b54-d3d216ed884b">asked ChatGPT</a>:</p>
<blockquote>
<p>Without running any searches tell me about your container.download tool</p>
</blockquote>
<p>And got back:</p>
<blockquote>
<p>Sure. <code>container.download</code> is a built-in tool I can call that <strong>fetches a file from a URL and saves it into the sandboxed container filesystem</strong> for this chat session.</p>
<h4 id="what-it-does">What it does</h4>
<ul>
<li>Takes a <strong>publicly reachable URL</strong> and a <strong>destination filepath</strong> in the container.</li>
<li>Downloads the bytes from that URL and writes them to the given path.</li>
<li>After that, I can read/process the file locally in the container (e.g., unzip it, parse it with Python, open it as an image, convert it, etc.).</li>
</ul>
<h4 id="signature-inputs-">Signature (inputs)</h4>
<p>It’s called like this (conceptually):</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"url"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://example.com/somefile.zip<span class="pl-pds">"</span></span>,
<span class="pl-ent">"filepath"</span>: <span class="pl-s"><span class="pl-pds">"</span>/mnt/data/somefile.zip<span class="pl-pds">"</span></span>
}</pre></div>
</blockquote>
<p>Further experimentation confirmed that <code>container.download</code> behaves as described.</p>
<p>I had it download one file from a server with logs enabled and captured the IP address and user-agent. Here are the most interesting headers:</p>
<pre><code>User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Accept: text/html, application/xhtml+xml, application/xml;q=0.9, image/avif, image/webp, image/apng, */*;q=0.8, application/signed-exchange;v=b3;q=0.9
Cf-Connecting-Ip: 52.230.164.178
</code></pre>
<p>That <code>52.230.164.178</code> IP address <a href="http://ip-api.com/json/52.230.164.178">resolves to</a> Microsoft Azure Cloud (centralus) in Des Moines, Iowa.</p>
<h4 id="is-container-download-a-data-exfiltration-vulnerability-">Is container.download a data exfiltration vulnerability?</h4>
<p>On the one hand, this is really useful! ChatGPT can navigate around websites looking for useful files, download those files to a container and then process them using Python or other languages.</p>
<p>Is this a data exfiltration vulnerability though? Could a prompt injection attack trick ChatGPT into leaking private data out to a <code>container.download</code> call to a URL with a query string that includes sensitive information?</p>
<p>I don't think it can. I tried getting it to assemble a URL with a query string and access it using <code>container.download</code> and it couldn't do it. It told me that it got back this error:</p>
<blockquote>
<p>ERROR: download failed because url not viewed in conversation before. open the file or url using web.run first.</p>
</blockquote>
<p>This looks to me like the same safety trick <a href="https://simonwillison.net/2025/Sep/10/claude-web-fetch-tool/">used by Claude's Web Fetch tool</a>: only allow URL access if that URL was either directly entered by the user or if it came from search results that could not have been influenced by a prompt injection.</p>
<p>(I poked at this a bit more and managed to get a simple constructed query string to pass through <code>web.run</code> - a different tool entirely - but when I tried to compose a longer query string containing the previous prompt history a <code>web.run</code> filter blocked it.)</p>
<p>So I <em>think</em> this is all safe, though I'm curious if it could hold firm against a more aggressive round of attacks from a seasoned security researcher.</p>
<h4 id="bash-and-other-languages">Bash and other languages</h4>
<p>The key lesson from coding agents like Claude Code and Codex CLI is that Bash rules everything: if an agent can run Bash commands in an environment it can do almost anything that can be achieved by typing commands into a computer.</p>
<p>When Anthropic added their own code interpreter feature to Claude <a href="https://simonwillison.net/2025/Sep/9/claude-code-interpreter/">last September</a> they built that around Bash rather than just Python. It looks to me like OpenAI have now done the same thing for ChatGPT.</p>
<p>Here's what ChatGPT looks like when it runs a Bash command - here my prompt was:</p>
<blockquote>
<p>npm install a fun package and demonstrate using it</p>
</blockquote>
<p><img src="https://static.simonwillison.net/static/2026/chatgpt-bash.jpg" alt="Screenshot of ChatGPT 5.2 Thinking interface with Activity panel. Main chat shows user prompt &quot;npm install a fun package and demonstrate using it&quot; and response &quot;Thought for 32s &gt; Yep — here's a fun one: cowsay 🐄 I installed it with npm and ran a tiny Node script:&quot; followed by bash code block containing &quot;mkdir -p /mnt/data/npmfun cd /mnt/data/npmfun npm init -y npm install cowsay --no-fund --no-audit node -e &quot;const cowsay=require('cowsay'); console.log(cowsay.say({text:'Hel&quot; and Output: section showing ASCII cow art saying &quot;&lt; Hello from npm inside the container! &gt;&quot; with cow drawing made of characters. Activity panel on right (32s) shows: &quot;Running npm install in container&quot; with bash command and output &quot;v22.16.0 10.9.2&quot;, &quot;Inspecting npm command output and container status&quot; with directory listing showing &quot;drwxr-xr-x 2 root oai_shared 80 Jan 26 17:32 . drwxrwxr-x 2 oai oai_shared 60 Jan 26 17:32 .. -rw-r--r-- 1 root root&quot;, &quot;Re-running npm install&quot; note about node_modules missing, and &quot;Rerunning npm install with output&quot; showing &quot;added 41 packages in 2s&quot;." style="max-width: 100%;" /></p>
<p>It's useful to click on the "Thinking" or "Thought for 32s" links as that opens the Activity sidebar with a detailed trace of what ChatGPT did to arrive at its answer. This helps guard against cheating - ChatGPT might claim to have run Bash in the main window but it can't fake those black and white logs in the Activity panel.</p>
<p>I had it run <a href="https://chatgpt.com/share/6977b890-f1c0-8006-b39a-9a02c7ca1993">Hello World in various languages</a> later in that same session.</p>
<p><img src="https://static.simonwillison.net/static/2026/chatgpt-hello-worlds.jpg" alt="Terminal screenshot showing &quot;Hello, world&quot; programs in multiple languages. Top section shows code: swift -e 'print(&quot;Hello, world from Swift!&quot;)' followed by cat &gt; hello.kts &lt;&lt;'KTS' / println(&quot;Hello, world from Kotlin!&quot;) / KTS / kotlinc -script hello.kts, then cat &gt; hello.c &lt;&lt;'C' / #include &lt;stdio.h&gt; / int main(){ puts(&quot;Hello, world from C!&quot;); return 0; } / C / gcc hello.c -o hello_c / ./hello_c, then cat &gt; hello.cpp &lt;&lt;'CPP' / #include &lt;iostream&gt; / int main(){ std::cout &lt;&lt; &quot;Hello, world from C++!&quot; &lt;&lt; std::endl; } / CPP / g++ hello.cpp -o hello_cpp / ./hello_cpp. Bottom section shows output: Hello, world from Ruby! / Hello, world from Perl! / Hello, world from PHP! / Hello, world from Go! / Hello, world from Java! / Hello, world from Swift! / Hello, world from Kotlin! / Hello, world from C! / Hello, world from C++!. UI shows &quot;Thought for 2m 29s&quot; and &quot;Done&quot; at bottom." style="max-width: 100%;" /></p>
<h4 id="installing-packages-from-pip-and-npm">Installing packages from pip and npm</h4>
<p>In the previous example ChatGPT installed the <code>cowsay</code> package from npm and used it to draw an ASCII-art cow. But how could it do that if the container can't make outbound network requests?</p>
<p>In <a href="https://chatgpt.com/share/69773501-b6d8-8006-bbf2-fa644561aa26">another session</a> I challenged it to explore its environment. and figure out how that worked.</p>
<p>Here's <a href="https://github.com/simonw/research/blob/main/chatgpt-container-environment/README.md">the resulting Markdown report</a> it created.</p>
<p>The key magic appears to be a <code>applied-caas-gateway1.internal.api.openai.org</code> proxy, available within the container and with various packaging tools configured to use it.</p>
<p>The following environment variables cause <code>pip</code> and <code>uv</code> to install packages from that proxy instead of directly from PyPI:</p>
<pre><code>PIP_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple
PIP_TRUSTED_HOST=packages.applied-caas-gateway1.internal.api.openai.org
UV_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple
UV_INSECURE_HOST=https://packages.applied-caas-gateway1.internal.api.openai.org
</code></pre>
<p>This one appears to get <code>npm</code> to work:</p>
<pre><code>NPM_CONFIG_REGISTRY=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../npm-public
</code></pre>
<p>And it reported these suspicious looking variables as well:</p>
<pre><code>CAAS_ARTIFACTORY_BASE_URL=packages.applied-caas-gateway1.internal.api.openai.org
CAAS_ARTIFACTORY_PYPI_REGISTRY=.../artifactory/api/pypi/pypi-public
CAAS_ARTIFACTORY_NPM_REGISTRY=.../artifactory/api/npm/npm-public
CAAS_ARTIFACTORY_GO_REGISTRY=.../artifactory/api/go/golang-main
CAAS_ARTIFACTORY_MAVEN_REGISTRY=.../artifactory/maven-public
CAAS_ARTIFACTORY_GRADLE_REGISTRY=.../artifactory/gradle-public
CAAS_ARTIFACTORY_CARGO_REGISTRY=.../artifactory/api/cargo/cargo-public/index
CAAS_ARTIFACTORY_DOCKER_REGISTRY=.../dockerhub-public
CAAS_ARTIFACTORY_READER_USERNAME=reader
CAAS_ARTIFACTORY_READER_PASSWORD=****
NETWORK=caas_packages_only
</code></pre>
<p>Neither Rust nor Docker are installed in the container environment, but maybe those registry references are a clue of features still to come.</p>
<h4 id="adding-it-all-together">Adding it all together</h4>
<p>The result of all of this? You can tell ChatGPT to use Python or Node.js packages as part of a conversation and it will be able to install them and apply them against files you upload or that it downloads from the public web. That's <em>really</em> cool.</p>
<p>The big missing feature here should be the easiest to provide: we need <strong>official documentation</strong>! A release notes entry would be a good start, but there are a lot of subtle details to how this new stuff works, its limitations and what it can be used for.</p>
<p>As always, I'd also encourage OpenAI to come up with a name for this set of features that properly represents how it works and what it can do.</p>
<p>In the meantime, I'm going to call this <strong>ChatGPT Containers</strong>.</p>
<h4 id="list-of-tools">Update: a full list of tools</h4>
<p>I decided to ask ChatGPT about other tools that were available to it in case there was anything interesting in there:</p>
<blockquote><p>List all tools that are available to you, with their exact names and descriptions and signatures</p></blockquote>
<p>Here's <a href="https://chatgpt.com/share/6977ffa0-df14-8006-9647-2b8c90ccbb81">what I got back</a>.</p>
<details><summary>List of ChatGPT tools according to GPT-5.2 Thinking on 26th January 2026</summary>
<p style="margin-top: 1em">Below are <strong>all tools available to me in this chat</strong>, with their <strong>exact names</strong>, <strong>descriptions</strong>, and <strong>signatures</strong> (as defined in my tool interface).</p>
<hr />
<h4 id="python">python</h4>
<h5 id="python-exec-freeform-any">
<code>python.exec</code> (FREEFORM) → any</h5>
<p><strong>Description:</strong> Execute Python code in a Jupyter environment for <em>private/internal reasoning</em> (not shown to you). Internet access is disabled.<br />
<strong>Signature:</strong> <code>python.exec(FREEFORM) -&gt; any</code></p>
<hr />
<h4 id="web">web</h4>
<h5 id="web-run-toolcallv5-json-any">
<code>web.run</code> (ToolCallV5 JSON) → any</h5>
<p><strong>Description:</strong> Access the internet and specialized lookups (search, open/click/find pages, screenshot PDFs, image search, product search, finance, weather, sports, calculator, time). Supports citations via returned <code>turn...</code> reference IDs.<br />
<strong>Signature:</strong><br />
<code>web.run({ open?, click?, find?, screenshot?, image_query?, product_query?, sports?, finance?, weather?, calculator?, time?, search_query?, response_length? }) -&gt; any</code></p>
<hr />
<h4 id="automations">automations</h4>
<h5 id="automations-create-any">
<code>automations.create</code> → any</h5>
<p><strong>Description:</strong> Schedule a future/recurring task/reminder/search via iCal VEVENT.<br />
<strong>Signature:</strong><br />
<code>automations.create({ prompt: string, title: string, schedule?: string, dtstart_offset_json?: string }) -&gt; any</code></p>
<h5 id="automations-update-any">
<code>automations.update</code> → any</h5>
<p><strong>Description:</strong> Modify an existing scheduled task.<br />
<strong>Signature:</strong><br />
<code>automations.update({ jawbone_id: string, schedule?: string, dtstart_offset_json?: string, prompt?: string, title?: string, is_enabled?: boolean }) -&gt; any</code></p>
<h5 id="automations-list-any">
<code>automations.list</code> → any</h5>
<p><strong>Description:</strong> List existing automations.<br />
<strong>Signature:</strong> <code>automations.list() -&gt; any</code></p>
<hr />
<h4 id="guardian-tool">guardian_tool</h4>
<h5 id="guardian-tool-get-policy-str">
<code>guardian_tool.get_policy</code> → str</h5>
<p><strong>Description:</strong> Retrieve policy guidance for specific categories (currently: <code>election_voting</code>).<br />
<strong>Signature:</strong> <code>guardian_tool.get_policy({ category: string }) -&gt; str</code></p>
<hr />
<h4 id="file-search">file_search</h4>
<h5 id="file-search-msearch-any">
<code>file_search.msearch</code> → any</h5>
<p><strong>Description:</strong> Search across available internal/connected sources (in this environment: primarily “recording_knowledge” transcripts when relevant).<br />
<strong>Signature:</strong><br />
<code>file_search.msearch({ queries?: string[], source_filter?: string[], file_type_filter?: string[], intent?: string, time_frame_filter?: { start_date?: string, end_date?: string } }) -&gt; any</code></p>
<h5 id="file-search-mclick-any">
<code>file_search.mclick</code> → any</h5>
<p><strong>Description:</strong> Expand/open previously returned search results (or connector URLs) for more context.<br />
<strong>Signature:</strong><br />
<code>file_search.mclick({ pointers?: string[], start_date?: string, end_date?: string }) -&gt; any</code></p>
<hr />
<h4 id="gmail-read-only-">gmail (read-only)</h4>
<h5 id="gmail-search-email-ids-any">
<code>gmail.search_email_ids</code> → any</h5>
<p><strong>Description:</strong> Search Gmail message IDs by query/tags (read-only).<br />
<strong>Signature:</strong><br />
<code>gmail.search_email_ids({ query?: string, tags?: string[], max_results?: integer, next_page_token?: string }) -&gt; any</code></p>
<h5 id="gmail-batch-read-email-any">
<code>gmail.batch_read_email</code> → any</h5>
<p><strong>Description:</strong> Read a batch of Gmail messages by message IDs (read-only).<br />
<strong>Signature:</strong><br />
<code>gmail.batch_read_email({ message_ids: string[] }) -&gt; any</code></p>
<hr />
<h4 id="gcal-read-only-">gcal (read-only)</h4>
<h5 id="gcal-search-events-any">
<code>gcal.search_events</code> → any</h5>
<p><strong>Description:</strong> Search Google Calendar events in a time range and/or by keyword (read-only).<br />
<strong>Signature:</strong><br />
<code>gcal.search_events({ time_min?: string, time_max?: string, timezone_str?: string, max_results?: integer, query?: string, calendar_id?: string, next_page_token?: string }) -&gt; any</code></p>
<h5 id="gcal-read-event-any">
<code>gcal.read_event</code> → any</h5>
<p><strong>Description:</strong> Read a specific calendar event by ID (read-only).<br />
<strong>Signature:</strong><br />
<code>gcal.read_event({ event_id: string, calendar_id?: string }) -&gt; any</code></p>
<hr />
<h4 id="gcontacts-read-only-">gcontacts (read-only)</h4>
<h5 id="gcontacts-search-contacts-any">
<code>gcontacts.search_contacts</code> → any</h5>
<p><strong>Description:</strong> Search Google Contacts by keyword (read-only).<br />
<strong>Signature:</strong><br />
<code>gcontacts.search_contacts({ query: string, max_results?: integer }) -&gt; any</code></p>
<hr />
<h4 id="canmore-canvas-">canmore (canvas)</h4>
<h5 id="canmore-create-textdoc-any">
<code>canmore.create_textdoc</code> → any</h5>
<p><strong>Description:</strong> Create a new canvas text document (markdown or code).<br />
<strong>Signature:</strong><br />
<code>canmore.create_textdoc({ name: string, type: "document" | "code/bash" | "code/zsh" | "code/javascript" | "code/typescript" | "code/html" | "code/css" | "code/python" | "code/json" | "code/sql" | "code/go" | "code/yaml" | "code/java" | "code/rust" | "code/cpp" | "code/swift" | "code/php" | "code/xml" | "code/ruby" | "code/haskell" | "code/kotlin" | "code/csharp" | "code/c" | "code/objectivec" | "code/r" | "code/lua" | "code/dart" | "code/scala" | "code/perl" | "code/commonlisp" | "code/clojure" | "code/ocaml" | "code/powershell" | "code/verilog" | "code/dockerfile" | "code/vue" | "code/react" | "code/other", content: string }) -&gt; any</code></p>
<h5 id="canmore-update-textdoc-any">
<code>canmore.update_textdoc</code> → any</h5>
<p><strong>Description:</strong> Apply regex-based edits to the current canvas textdoc.<br />
<strong>Signature:</strong><br />
<code>canmore.update_textdoc({ updates: { pattern: string, multiple?: boolean, replacement: string }[] }) -&gt; any</code></p>
<h5 id="canmore-comment-textdoc-any">
<code>canmore.comment_textdoc</code> → any</h5>
<p><strong>Description:</strong> Add actionable comments on matched regions of the current canvas textdoc.<br />
<strong>Signature:</strong><br />
<code>canmore.comment_textdoc({ comments: { pattern: string, comment: string }[] }) -&gt; any</code></p>
<hr />
<h4 id="python-user-visible">python_user_visible</h4>
<h5 id="python-user-visible-exec-freeform-any">
<code>python_user_visible.exec</code> (FREEFORM) → any</h5>
<p><strong>Description:</strong> Execute Python code <em>that you will see</em> (tables, plots, generated files). Internet disabled.<br />
<strong>Signature:</strong> <code>python_user_visible.exec(FREEFORM) -&gt; any</code></p>
<hr />
<h4 id="user-info">user_info</h4>
<h5 id="user-info-get-user-info-any">
<code>user_info.get_user_info</code> → any</h5>
<p><strong>Description:</strong> Get your current location and local time (or UTC if unknown).<br />
<strong>Signature:</strong> <code>user_info.get_user_info({}) -&gt; any</code></p>
<hr />
<h4 id="summary-reader">summary_reader</h4>
<h5 id="summary-reader-read-any">
<code>summary_reader.read</code> → any</h5>
<p><strong>Description:</strong> Read safe-to-share internal reasoning summaries from earlier turns (used when asked how I arrived at something).<br />
<strong>Signature:</strong> <code>summary_reader.read({ limit?: integer, offset?: integer }) -&gt; any</code></p>
<hr />
<h4 id="container">container</h4>
<h5 id="container-exec-any">
<code>container.exec</code> → any</h5>
<p><strong>Description:</strong> Run a command inside the container and return stdout/stderr.<br />
<strong>Signature:</strong><br />
<code>container.exec({ cmd: string[], session_name?: string | null, workdir?: string | null, timeout?: integer | null, env?: object | null, user?: string | null }) -&gt; any</code></p>
<h5 id="container-feed-chars-any">
<code>container.feed_chars</code> → any</h5>
<p><strong>Description:</strong> Send input to a running interactive exec session and flush output.<br />
<strong>Signature:</strong><br />
<code>container.feed_chars({ session_name: string, chars: string, yield_time_ms?: integer }) -&gt; any</code></p>
<h5 id="container-open-image-any">
<code>container.open_image</code> → any</h5>
<p><strong>Description:</strong> Display an image from an absolute path in the container (jpg/jpeg/png/webp).<br />
<strong>Signature:</strong><br />
<code>container.open_image({ path: string, user?: string | null }) -&gt; any</code></p>
<h5 id="container-download-any">
<code>container.download</code> → any</h5>
<p><strong>Description:</strong> Download a file from a URL into the container filesystem.<br />
<strong>Signature:</strong><br />
<code>container.download({ url: string, filepath: string }) -&gt; any</code></p>
<hr />
<h4 id="bio-memory-">bio (memory)</h4>
<h5 id="bio-update-freeform-any">
<code>bio.update</code> (FREEFORM) → any</h5>
<p><strong>Description:</strong> Persist (or forget) user-specific info across conversations (“memory”).<br />
<strong>Signature:</strong> <code>bio.update(FREEFORM) -&gt; any</code></p>
<hr />
<h4 id="api-tool">api_tool</h4>
<h5 id="api-tool-list-resources-any">
<code>api_tool.list_resources</code> → any</h5>
<p><strong>Description:</strong> List available “resources” (including invokable tools) in an OP-style file tree.<br />
<strong>Signature:</strong><br />
<code>api_tool.list_resources({ path?: string, cursor?: string | null, only_tools?: boolean, refetch_tools?: boolean }) -&gt; any</code></p>
<h5 id="api-tool-call-tool-any">
<code>api_tool.call_tool</code> → any</h5>
<p><strong>Description:</strong> Invoke a resource as a tool (must know its full path/URI from <code>list_resources</code>).<br />
<strong>Signature:</strong><br />
<code>api_tool.call_tool({ path: string, args: object }) -&gt; any</code></p>
<hr />
<h4 id="personal-context">personal_context</h4>
<h5 id="personal-context-search-any">
<code>personal_context.search</code> → any</h5>
<p><strong>Description:</strong> Retrieve user-specific context from prior interactions/linked sources to maintain continuity.<br />
<strong>Signature:</strong><br />
<code>personal_context.search({ query: string }) -&gt; any</code></p>
<hr />
<h4 id="image-gen">image_gen</h4>
<h5 id="image-gen-text2im-any">
<code>image_gen.text2im</code> → any</h5>
<p><strong>Description:</strong> Generate or edit images from text instructions (and optionally referenced image IDs).<br />
<strong>Signature:</strong><br />
<code>image_gen.text2im({ prompt?: string | null, size?: string | null, n?: integer | null, transparent_background?: boolean | null, is_style_transfer?: boolean | null, referenced_image_ids?: string[] | null }) -&gt; any</code></p>
<hr />
<h4 id="user-settings">user_settings</h4>
<h5 id="user-settings-get-user-settings-any">
<code>user_settings.get_user_settings</code> → any</h5>
<p><strong>Description:</strong> Read current settings and allowed values (personality, accent color, appearance).<br />
<strong>Signature:</strong> <code>user_settings.get_user_settings() -&gt; any</code></p>
<h5 id="user-settings-set-setting-any">
<code>user_settings.set_setting</code> → any</h5>
<p><strong>Description:</strong> Change one setting (accent color, appearance, or personality) to an allowed value.<br />
<strong>Signature:</strong><br />
<code>user_settings.set_setting({ setting_name: "accent_color" | "appearance" | "personality", setting_value: string }) -&gt; any</code></p>
<hr />
<h4 id="artifact-handoff">artifact_handoff</h4>
<h5 id="artifact-handoff-prepare-artifact-generation-any">
<code>artifact_handoff.prepare_artifact_generation</code> → any</h5>
<p><strong>Description:</strong> Must be called immediately when the user asks for a spreadsheet or slide deck artifact.<br />
<strong>Signature:</strong> <code>artifact_handoff.prepare_artifact_generation() -&gt; any</code></p>
</details>
<p>Tags: <a href="https://simonwillison.net/tags/pypi">pypi</a>, <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/npm">npm</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/chatgpt">chatgpt</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/code-interpreter">code-interpreter</a></p>
</summary><category term="pypi"/><category term="sandboxing"/><category term="npm"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-assisted-programming"/><category term="code-interpreter"/></entry><entry><title>the browser is the sandbox</title><link href="https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/#atom-everything" rel="alternate"/><published>2026-01-25T23:51:32+00:00</published><updated>2026-01-25T23:51:32+00:00</updated><id>https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/#atom-everything</id><summary type="html">
<p><strong><a href="https://aifoc.us/the-browser-is-the-sandbox/">the browser is the sandbox</a></strong></p>
Paul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to operate in and put together these detailed notes on how the web browser can help:</p>
<blockquote>
<p>This got me thinking about the browser. Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web, the instant a user taps a URL. [...]</p>
<p>Could you build something like Cowork in the browser? Maybe. To find out, I built a demo called <a href="http://co-do.xyz">Co-do</a> that tests this hypothesis. In this post I want to discuss the research I've done to see how far we can get, and determine if the browser's ability to run untrusted code is useful (and good enough) for enabling software to do more for us directly on our computer.</p>
</blockquote>
<p>Paul then describes how the three key aspects of a sandbox - filesystem, network access and safe code execution - can be handled by browser technologies: the <a href="https://developer.chrome.com/docs/capabilities/web-apis/file-system-access">File System Access API</a> (still Chrome-only as far as I can tell), CSP headers with <code>&lt;iframe sandbox&gt;</code> and WebAssembly in Web Workers.</p>
<p>Co-do is a very interesting demo that illustrates all of these ideas in a single application:</p>
<p><img alt="Screenshot of Co-do application interface with robot logo. Left sidebar shows WORKSPACE section with &quot;Select Folder&quot; button, &quot;Selected folder: blog-drafts (Live updates enabled)&quot; in green text, and FILES list including gtr-t5-large.md, chatgpt.md, llm-digest-october-2..., weeknotes-2024-04-23..., lmarena-april-2025.md, weeknotes-chatgpt-fo..., tests-not-optional-c..., weeknotes-13-mar-202..., issue-for-notes.md, workshop-git-scrapin.... Main chat area shows user message &quot;Three most recently edited files?&quot; in orange bubble, &quot;Using 26 tools...&quot; dropdown, and AI response: &quot;Now I need to get the metadata for these files to find the most recently edited ones. Since there are many files, let me get metadata for a sample of them to find the recent ones:Based on the metadata I've gathered, the three most recently edited files are:&quot; followed by table with columns File and Last Modified showing predictions-2026.md (Jan 8, 2026), december-2025.md (Jan 2, 2026), gemini-3-flash.md (Dec 17, 2025). Green &quot;Response complete&quot; banner at bottom with input field placeholder &quot;Ask me to help with your files...&quot; and notice &quot;File contents are sent to your selected AI provider&quot;." src="https://static.simonwillison.net/static/2026/codo.jpg" /></p>
<p>You select a folder full of files and configure an LLM provider and set an API key, Co-do then uses CSP-approved API calls to interact with that provider and provides a chat interface with tools for interacting with those files. It does indeed feel similar to <a href="https://simonwillison.net/2026/Jan/12/claude-cowork/">Claude Cowork</a> but without running a multi-GB local container to provide the sandbox.</p>
<p>My biggest complaint about <code>&lt;iframe sandbox&gt;</code> remains how thinly documented it is, especially across different browsers. Paul's post has all sorts of useful details on that which I've not encountered elsewhere, including a complex <a href="https://aifoc.us/the-browser-is-the-sandbox/#the-double-iframe-technique">double-iframe technique</a> to help apply network rules to the inner of the two frames.</p>
<p>Thanks to this post I also learned about the <code>&lt;input type="file" webkitdirectory&gt;</code> tag which turns out to work on Firefox, Safari <em>and</em> Chrome and allows a browser read-only access to a full directory of files at once. I had Claude knock up a <a href="https://tools.simonwillison.net/webkitdirectory">webkitdirectory demo</a> to try it out and I'll certainly be using it for projects in the future.</p>
<p><img alt="Screenshot of a dark-themed file explorer application. Top stats bar shows: 12179 TOTAL FILES, 2079 FOLDERS, 244 MB TOTAL SIZE, 97 FILE TYPES in cyan text. Search bar with placeholder &quot;Search files...&quot; and &quot;All types&quot; dropdown. Left panel labeled &quot;File tree&quot; (showing 12179) displays folder hierarchy: datasette &gt; .claude &gt; skills &gt; building-datasette-plugins containing HOOKS.md (10.7 KB, selected/highlighted), INTERNALS.md (10.1 KB), SKILL.md (3.7 KB), TESTING.md (8.4 KB), settings.local.json (280 B); also shows .eggs folder with pytest_runner-6.0.1-py3.9.egg. Right panel &quot;File preview&quot; shows selected file details: Name: HOOKS.md, Path: datasette/.claude/skills/building-datasette-plugins/HOOKS.md, Size: 10.7 KB, Type: text/markdown, Last modified: 12/20/2025, 9:28:59 AM. Preview content shows: &quot;# Plugin Hooks Reference&quot; followed by &quot;All hooks use the @hookimpl decorator. Accept only the parameters you need.&quot; then &quot;## Database Connection Hooks&quot; and &quot;### prepare_connection(conn, database, datasette)&quot; with description &quot;Called when a new SQLite connection is created. Use to register custom SQL functions.&quot; Bottom section &quot;File type distribution&quot; shows horizontal bar chart: .py (4439), .no ext (3358), .dat (1068), .pyc (925), .txt (332), .mo (321), .po (321), .html (249)." src="https://static.simonwillison.net/static/2026/webkit-file-tree.jpg" />
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/javascript">javascript</a>, <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p>
</summary><category term="browsers"/><category term="javascript"/><category term="sandboxing"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/></entry><entry><title>Kākāpō Cam: Rakiura live stream</title><link href="https://simonwillison.net/2026/Jan/25/kakapo-cam/#atom-everything" rel="alternate"/><published>2026-01-25T04:53:01+00:00</published><updated>2026-01-25T04:53:01+00:00</updated><id>https://simonwillison.net/2026/Jan/25/kakapo-cam/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.doc.govt.nz/our-work/kakapo-recovery/what-we-do/kakapo-cam-rakiura-live-stream/">Kākāpō Cam: Rakiura live stream</a></strong></p>
Critical update for this year's <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-k-k-p-parrots-will-have-an-outstanding-breeding-season">Kākāpō breeding season</a>: the New Zealand Department of Conservation have a livestream running of Rakiura's nest!</p>
<blockquote>
<p>You’re looking at the underground nest of 23-year-old Rakiura. She has chosen this same site to nest for all seven breeding seasons since 2008, a large cavity under a rātā tree. Because she returns to the site so reliably, we’ve been able to make modifications over the years to keep it safe and dry, including adding a well-placed hatch for monitoring eggs and chicks.</p>
</blockquote>
<p>Rakiura is a legendary Kākāpō:</p>
<blockquote>
<p>Rakiura hatched on 19 February 2002 on Whenua Hou/Codfish Island. She is the offspring of Flossie and Bill. Her name comes from the te reo Māori name for Stewart Island, the place where most of the founding kākāpō population originated.</p>
<p>Rakiura has nine living descendants, three females and six males, across six breeding seasons. In 2008 came Tōitiiti, in 2009 Tamahou and Te Atapō, in 2011 Tia and Tūtoko, in 2014 Taeatanga and Te Awa, in 2019 Mati-mā and Tautahi. She also has many grandchicks.</p>
</blockquote>
<p>She laid her first egg of the season at 4:30pm NZ time on 22nd January. The livestream went live shortly afterwards, once she committed to this nest.</p>
<p>The stream is <a href="https://www.youtube.com/watch?v=BfGL7A2YgUY">on YouTube</a>. I <a href="https://gisthost.github.io/?dc78322de89a2191c593215f109c65d7/index.html">used Claude Code</a> to write <a href="https://tools.simonwillison.net/python/#livestream-gifpy">a livestream-gif.py script</a> and used that to capture this sped-up video of the last few hours of footage, within which you can catch a glimpse of the egg!</p>
<video autoplay muted loop controls playsinline style="width: 100%;">
<source src="https://static.simonwillison.net/static/2026/kakapo-timelapse.mp4" type="video/mp4">
</video>
<p><small></small>Via <a href="https://www.metafilter.com/211927/The-only-parrot-to-have-a-polygynous-lek-breeding-system-sits-on-an-egg">MetaFilter</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/youtube">youtube</a>, <a href="https://simonwillison.net/tags/kakapo">kakapo</a>, <a href="https://simonwillison.net/tags/conservation">conservation</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p>
</summary><category term="youtube"/><category term="kakapo"/><category term="conservation"/><category term="claude-code"/></entry><entry><title>Don't "Trust the Process"</title><link href="https://simonwillison.net/2026/Jan/24/dont-trust-the-process/#atom-everything" rel="alternate"/><published>2026-01-24T23:31:03+00:00</published><updated>2026-01-24T23:31:03+00:00</updated><id>https://simonwillison.net/2026/Jan/24/dont-trust-the-process/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.youtube.com/watch?v=4u94juYwLLM">Don&#x27;t &quot;Trust the Process&quot;</a></strong></p>
Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September.</p>
<p><img alt="Don't &quot;Trust the process&quot; slide, speaker shown on the left" src="https://static.simonwillison.net/static/2026/dont-trust-process.jpg" /></p>
<p>Jenny argues that the Design Process - user research leading to personas leading to user journeys leading to wireframes... all before anything gets built - may be outdated for today's world.</p>
<blockquote>
<p><strong>Hypothesis</strong>: In a world where anyone can make anything — what matters is your ability to choose and curate what you make.</p>
</blockquote>
<p>In place of the Process, designers should lean into prototypes. AI makes these much more accessible and less time-consuming than they used to be.</p>
<p>Watching this talk made me think about how AI-assisted programming significantly reduces the cost of building the <em>wrong</em> thing. Previously if the design wasn't right you could waste months of development time building in the wrong direction, which was a very expensive mistake. If a wrong direction wastes just a few days instead we can take more risks and be much more proactive in exploring the problem space.</p>
<p>I've always been a compulsive prototyper though, so this is very much playing into my own existing biases!
<p><small></small>Via <a href="https://twitter.com/jenny_wen/status/2014479445738893649">@jenny_wen</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/design">design</a>, <a href="https://simonwillison.net/tags/prototyping">prototyping</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/vibe-coding">vibe-coding</a></p>
</summary><category term="design"/><category term="prototyping"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="vibe-coding"/></entry><entry><title>Quoting Jasmine Sun</title><link href="https://simonwillison.net/2026/Jan/24/jasmine-sun/#atom-everything" rel="alternate"/><published>2026-01-24T21:34:35+00:00</published><updated>2026-01-24T21:34:35+00:00</updated><id>https://simonwillison.net/2026/Jan/24/jasmine-sun/#atom-everything</id><summary type="html">
<blockquote cite="https://jasmi.news/p/claude-code"><p><strong>If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.”</strong> Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are. [...]</p>
<p>Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. <em>Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg</em>, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.</p></blockquote>
<p class="cite">&mdash; <a href="https://jasmi.news/p/claude-code">Jasmine Sun</a></p>
<p>Tags: <a href="https://simonwillison.net/tags/vibe-coding">vibe-coding</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p>
</summary><category term="vibe-coding"/><category term="coding-agents"/><category term="claude-code"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Wilson Lin on FastRender: a browser built by thousands of parallel agents</title><link href="https://simonwillison.net/2026/Jan/23/fastrender/#atom-everything" rel="alternate"/><published>2026-01-23T21:26:10+00:00</published><updated>2026-01-23T21:26:10+00:00</updated><id>https://simonwillison.net/2026/Jan/23/fastrender/#atom-everything</id><summary type="html">
<p>Last week Cursor published <a href="https://cursor.com/blog/scaling-agents">Scaling long-running autonomous coding</a>, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was <a href="https://github.com/wilsonzlin/fastrender">FastRender</a>, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4">now available on YouTube</a>. I've included some of the highlights below.</p>
<iframe style="margin-top: 1.5em; margin-bottom: 1.5em;" width="560" height="315" src="https://www.youtube-nocookie.com/embed/bKrAcTf2pL4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="allowfullscreen"> </iframe>
<p>See my <a href="https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/">previous post</a> for my notes and screenshots from trying out FastRender myself.</p>
<h4 id="what-fastrender-can-do-right-now">What FastRender can do right now</h4>
<p>We started the conversation with a demo of FastRender loading different pages (<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=195s">03:15</a>). The JavaScript engine isn't working yet so we instead loaded <a href="https://github.com/wilsonzlin/fastrender">github.com/wilsonzlin/fastrender</a>, <a href="https://en.wikipedia.org/">Wikipedia</a> and <a href="https://cnn.com">CNN</a> - all of which were usable, if a little slow to display.</p>
<p>JavaScript had been disabled by one of the agents, which decided to add a feature flag! <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=242s">04:02</a></p>
<blockquote>
<p>JavaScript is disabled right now. The agents made a decision as they were currently still implementing the engine and making progress towards other parts... they decided to turn it off or put it behind a feature flag, technically.</p>
</blockquote>
<h4 id="from-side-project-to-core-research">From side-project to core research</h4>
<p>Wilson started what become FastRender as a personal side-project to explore the capabilities of the latest generation of frontier models - Claude Opus 4.5, GPT-5.1, and GPT-5.2. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=56s">00:56</a></p>
<blockquote>
<p>FastRender was a personal project of mine from, I'd say, November. It was an experiment to see how well frontier models like Opus 4.5 and back then GPT-5.1 could do with much more complex, difficult tasks.</p>
</blockquote>
<p>A browser rendering engine was the ideal choice for this, because it's both <em>extremely</em> ambitious and complex but also well specified. And you can visually see how well it's working! <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=117s">01:57</a></p>
<blockquote>
<p>As that experiment progressed, I was seeing better and better results from single agents that were able to actually make good progress on this project. And at that point, I wanted to see, well, what's the next level? How do I push this even further?</p>
</blockquote>
<p>Once it became clear that this was an opportunity to try multiple agents working together it graduated to an official Cursor research project, and available resources were amplified.</p>
<p>The goal of FastRender was never to build a browser to compete with the likes of Chrome. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=2512s">41:52</a></p>
<blockquote>
<p>We never intended for it to be a production software or usable, but we wanted to observe behaviors of this harness of multiple agents, to see how they could work at scale.</p>
</blockquote>
<p>The great thing about a browser is that it has such a large scope that it can keep serving experiments in this space for many years to come. JavaScript, then WebAssembly, then WebGPU... it could take many years to run out of new challenges for the agents to tackle.</p>
<h4 id="running-thousands-of-agents-at-once">Running thousands of agents at once</h4>
<p>The most interesting thing about FastRender is the way the project used multiple agents working in parallel to build different parts of the browser. I asked how many agents were running at once: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=324s">05:24</a></p>
<blockquote>
<p>At the peak, when we had the stable system running for one week continuously, there were approximately 2,000 agents running concurrently at one time. And they were making, I believe, thousands of commits per hour.</p>
</blockquote>
<p>The project has <a href="https://github.com/wilsonzlin/fastrender/commits/main/">nearly 30,000 commits</a>!</p>
<p>How do you run 2,000 agents at once? They used <em>really big machines</em>. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=356s">05:56</a></p>
<blockquote>
<p>The simple approach we took with the infrastructure was to have a large machine run one of these multi-agent harnesses. Each machine had ample resources, and it would run about 300 agents concurrently on each. This was able to scale and run reasonably well, as agents spend a lot of time thinking, and not just running tools.</p>
</blockquote>
<p>At this point we switched to a live demo of the harness running on one of those big machines (<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=392s">06:32</a>). The agents are arranged in a tree structure, with planning agents firing up tasks and worker agents then carrying them out. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=434s">07:14</a></p>
<p><img src="https://static.simonwillison.net/static/2026/wilson-lin-agents.jpg" alt="Terminal window showing a tmux session running &quot;grind-swarm&quot; task manager with RUNNING status. Header shows &quot;grind-swarm – 45:54:15&quot; with stats &quot;planners: 9 (0 done) | tasks: 111 working, 0 pending, 232 done | 12900.9M↑ 514.1M↓&quot;. Task list includes: p1 Root (main), p2 CSS selector matching performance + bloom filter integration, p3 CSS stylesheet parsing semantics &amp; at-rule handling, p4 Custom properties (@property) + var() resolution + incremental recompute/invalidation, p37 CSS at-rule artifact integration, p50 Selector engine correctness &amp; spec coverage, p51 Computed-value + property coverage across css-cascade, p105 Style sharing / computed style caching in fastrender-style, p289 CSS cascade layers (@layer) global ordering, w5 Fix workspace lockfile drift, w7 Implement computed-style snapshot sharing, w15 Fix css-properties namespace handling, w17 (Stretch) Enable bloom fast-reject in HTML quirks mode, w18 Refactor css-properties stylesheet parsing. Activity log shows shell commands including cargo check, git status, git push origin main, and various test runs. Bottom status bar shows &quot;grind-css0:target/release/grind-swarm*&quot; and &quot;streamyard.com is sharing your screen&quot; notification with timestamp &quot;12:02 22-Jan-26&quot;." style="max-width: 100%;" /></p>
<blockquote>
<p>This cluster of agents is working towards building out the CSS aspects of the browser, whether that's parsing, selector engine, those features. We managed to push this even further by splitting out the browser project into multiple instructions or work streams and have each one run one of these harnesses on their own machine, so that was able to further parallelize and increase throughput.</p>
</blockquote>
<p>But don't all of these agents working on the same codebase result in a huge amount of merge conflicts? Apparently not: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=501s">08:21</a></p>
<blockquote>
<p>We've noticed that most commits do not have merge conflicts. The reason is the harness itself is able to quite effectively split out and divide the scope and tasks such that it tries to minimize the amount of overlap of work. That's also reflected in the code structure—commits will be made at various times and they don't tend to touch each other at the same time.</p>
</blockquote>
<p>This appears to be the key trick for unlocking benefits from parallel agents: if planning agents do a good enough job of breaking up the work into non-overlapping chunks you can bring hundreds or even thousands of agents to bear on a problem at once.</p>
<p>Surprisingly, Wilson found that GPT-5.1 and GPT-5.2 were a better fit for this work than the coding specialist GPT-5.1-Codex: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1048s">17:28</a></p>
<blockquote>
<p>Some initial findings were that the instructions here were more expansive than merely coding. For example, how to operate and interact within a harness, or how to operate autonomously without interacting with the user or having a lot of user feedback. These kinds of instructions we found worked better with the general models.</p>
</blockquote>
<p>I asked what the longest they've seen this system run without human intervention: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1108s">18:28</a></p>
<blockquote>
<p>So this system, once you give an instruction, there's actually no way to steer it, you can't prompt it, you're going to adjust how it goes. The only thing you can do is stop it. So our longest run, all the runs are basically autonomous. We don't alter the trajectory while executing. [...]</p>
<p>And so the longest at the time of the post was about a week and that's pretty close to the longest. Of course the research project itself was only about three weeks so you know we probably can go longer.</p>
</blockquote>
<h4 id="specifications-and-feedback-loops">Specifications and feedback loops</h4>
<p>An interesting aspect of this project design is feedback loops. For agents to work autonomously for long periods of time they need as much useful context about the problem they are solving as possible, combined with effective feedback loops to help them make decisions.</p>
<p>The FastRender repo <a href="https://github.com/wilsonzlin/fastrender/tree/19bf1036105d4eeb8bf3330678b7cb11c1490bdc/specs">uses git submodules to include relevant specifications</a>, including csswg-drafts, tc39-ecma262 for JavaScript, whatwg-dom, whatwg-html and more. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=846s">14:06</a></p>
<blockquote>
<p>Feedback loops to the system are very important. Agents are working for very long periods continuously, and without guardrails and feedback to know whether what they're doing is right or wrong it can have a big impact over a long rollout. Specs are definitely an important part—you can see lots of comments in the code base that AI wrote referring specifically to specs that they found in the specs submodules.</p>
</blockquote>
<p>GPT-5.2 is a vision-capable model, and part of the feedback loop for FastRender included taking screenshots of the rendering results and feeding those back into the model:
<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=983s">16:23</a></p>
<blockquote>
<p>In the earlier evolution of this project, when it was just doing the static renderings of screenshots, this was definitely a very explicit thing we taught it to do. And these models are visual models, so they do have that ability. We have progress indicators to tell it to compare the diff against a golden sample.</p>
</blockquote>
<p>The strictness of the Rust compiler helped provide a feedback loop as well: <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=952s">15:52</a></p>
<blockquote>
<p>The nice thing about Rust is you can get a lot of verification just from compilation, and that is not as available in other languages.</p>
</blockquote>
<h4 id="the-agents-chose-the-dependencies">The agents chose the dependencies</h4>
<p>We talked about the <a href="https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4eeb8bf3330678b7cb11c1490bdc/Cargo.toml">Cargo.toml dependencies</a> that the project had accumulated, almost all of which had been selected by the agents themselves.</p>
<p>Some of these, like <a href="https://skia.org/">Skia</a> for 2D graphics rendering or <a href="https://github.com/harfbuzz/harfbuzz">HarfBuzz</a> for text shaping, were obvious choices. Others such as <a href="https://github.com/DioxusLabs/taffy">Taffy</a> felt like they might go against the from-scratch goals of the project, since that library implements CSS flexbox and grid layout algorithms directly. This was not an intended outcome. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1673s">27:53</a></p>
<blockquote>
<p>Similarly these are dependencies that the agent picked to use for small parts of the engine and perhaps should have actually implemented itself. I think this reflects on the importance of the instructions, because I actually never encoded specifically the level of dependencies we should be implementing ourselves.</p>
</blockquote>
<p>The agents vendored in Taffy and <a href="https://github.com/wilsonzlin/fastrender/commits/main/vendor/taffy">applied a stream of changes</a> to that vendored copy.
<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=1878s">31:18</a></p>
<blockquote>
<p>It's currently vendored. And as the agents work on it, they do make changes to it. This was actually an artifact from the very early days of the project before it was a fully fledged browser... it's implementing things like the flex and grid layers, but there are other layout methods like inline, block, and table, and in our new experiment, we're removing that completely.</p>
</blockquote>
<p>The inclusion of QuickJS despite the presence of a home-grown ecma-rs implementation has a fun origin story:
<a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=2115s">35:15</a></p>
<blockquote>
<p>I believe it mentioned that it pulled in the QuickJS because it knew that other agents were working on the JavaScript engine, and it needed to unblock itself quickly. [...]</p>
<p>It was like, eventually, once that's finished, let's remove it and replace with the proper engine.</p>
</blockquote>
<p>I love how similar this is to the dynamics of a large-scale human engineering team, where you could absolutely see one engineer getting frustrated at another team not having delivered yet and unblocking themselves by pulling in a third-party library.</p>
<h4 id="intermittent-errors-are-ok-actually">Intermittent errors are OK, actually</h4>
<p>Here's something I found really surprising: the agents were allowed to introduce small errors into the codebase as they worked! <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=2382s">39:42</a></p>
<blockquote>
<p>One of the trade-offs was: if you wanted every single commit to be a hundred percent perfect, make sure it can always compile every time, that might be a synchronization bottleneck. [...]</p>
<p>Especially as you break up the system into more modularized aspects, you can see that errors get introduced, but small errors, right? An API change or some syntax error, but then they get fixed really quickly after a few commits. So there's a little bit of slack in the system to allow these temporary errors so that the overall system can continue to make progress at a really high throughput. [...]</p>
<p>People may say, well, that's not correct code. But it's not that the errors are accumulating. It's a stable rate of errors. [...] That seems like a worthwhile trade-off.</p>
</blockquote>
<p>If you're going to have thousands of agents working in parallel optimizing for throughput over correctness turns out to be a strategy worth exploring.</p>
<h4 id="a-single-engineer-plus-a-swarm-of-agents-in-january-2026">A single engineer plus a swarm of agents in January 2026</h4>
<p>The thing I find most interesting about FastRender is how it demonstrates the extreme edge of what a single engineer can achieve in early 2026 with the assistance of a swarm of agents.</p>
<p>FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree.</p>
<p>A browser really is the ideal research project to experiment with this new, weirdly shaped form of software engineering.</p>
<p>I asked Wilson how much mental effort he had invested in browser rendering compared to agent co-ordination. <a href="https://www.youtube.com/watch?v=bKrAcTf2pL4&amp;t=694s">11:34</a></p>
<blockquote>
<p>The browser and this project were co-developed and very symbiotic, only because the browser was a very useful objective for us to measure and iterate the progress of the harness. The goal was to iterate on and research the multi-agent harness—the browser was just the research example or objective.</p>
</blockquote>
<p>FastRender is effectively using a full browser rendering engine as a "hello world" exercise for multi-agent coordination!</p>
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/youtube">youtube</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/cursor">cursor</a>, <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/browser-challenge">browser-challenge</a></p>
</summary><category term="browsers"/><category term="youtube"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="cursor"/><category term="parallel-agents"/><category term="browser-challenge"/></entry><entry><title>Quoting Theia Vogel</title><link href="https://simonwillison.net/2026/Jan/23/theia-vogel/#atom-everything" rel="alternate"/><published>2026-01-23T09:13:54+00:00</published><updated>2026-01-23T09:13:54+00:00</updated><id>https://simonwillison.net/2026/Jan/23/theia-vogel/#atom-everything</id><summary type="html">
<blockquote cite="https://twitter.com/voooooogel/status/2014189072647078053"><p>[...] i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the natural boundary of my computer hemming the towns in, and town mayors played the role of big men to drive conflict. so i just needed a way for them to fight. i slightly tweaked the allocation of claude max accounts to the towns from a demand-based to a fixed allocation system. towns would each get a fixed amount of tokens to start, but i added a soldier role that could attack and defend in raids to steal tokens from other towns. [...]</p></blockquote>
<p class="cite">&mdash; <a href="https://twitter.com/voooooogel/status/2014189072647078053">Theia Vogel</a>, Gas Town fan fiction</p>
<p>Tags: <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a></p>
</summary><category term="parallel-agents"/><category term="llms"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>SSH has no Host header</title><link href="https://simonwillison.net/2026/Jan/22/ssh-has-no-host-header/#atom-everything" rel="alternate"/><published>2026-01-22T23:57:50+00:00</published><updated>2026-01-22T23:57:50+00:00</updated><id>https://simonwillison.net/2026/Jan/22/ssh-has-no-host-header/#atom-everything</id><summary type="html">
<p><strong><a href="https://blog.exe.dev/ssh-host-header">SSH has no Host header</a></strong></p>
<a href="https://exe.dev/">exe.dev</a> is a new hosting service that, for $20/month, gives you up to 25 VMs "that share 2 CPUs and 8GB RAM". Everything happens over SSH, including creating new VMs. Once configured you can sign into your exe.dev VMs like this:</p>
<pre><code>ssh simon.exe.dev
</code></pre>
<p>Here's the clever bit: when you run the above command <code>exe.dev</code> signs you into your VM of that name... but they don't assign every VM its own IP address and SSH has no equivalent of the Host header, so how does their load balancer know <em>which</em> of your VMs to forward you on to?</p>
<p>The answer is that while they don't assign a unique IP to every VM they <em>do</em> have enough IPs that they can ensure each of your VMs has an IP that is unique to your account.</p>
<p>If I create two VMs they will each resolve to a separate IP address, each of which is shared with many other users. The underlying infrastructure then identifies my user account from my SSH public key and can determine which underlying VM to forward my SSH traffic to.
<p><small></small>Via <a href="https://lobste.rs/s/7oqiqi/ssh_has_no_host_header">lobste.rs</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/dns">dns</a>, <a href="https://simonwillison.net/tags/hosting">hosting</a>, <a href="https://simonwillison.net/tags/ssh">ssh</a></p>
</summary><category term="dns"/><category term="hosting"/><category term="ssh"/></entry><entry><title>Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</title><link href="https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-everything" rel="alternate"/><published>2026-01-22T17:42:34+00:00</published><updated>2026-01-22T17:42:34+00:00</updated><id>https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-everything</id><summary type="html">
<p><strong><a href="https://qwen.ai/blog?id=qwen3tts-0115">Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</a></strong></p>
I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got <em>really good</em>, so I can't speak for how notable this new release from Qwen is.</p>
<p>From <a href="https://github.com/QwenLM/Qwen3-TTS/blob/main/assets/Qwen3_TTS.pdf">the accompanying paper</a>:</p>
<blockquote>
<p>In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.</p>
</blockquote>
<p>To give an idea of size, <a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base">Qwen/Qwen3-TTS-12Hz-1.7B-Base</a> is 4.54GB on Hugging Face and <a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base">Qwen/Qwen3-TTS-12Hz-0.6B-Base</a> is 2.52GB.</p>
<p>The <a href="https://huggingface.co/spaces/Qwen/Qwen3-TTS">Hugging Face demo</a> lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:</p>
<p><img alt="Screenshot of a Qwen3-TTS voice cloning web interface with three tabs at top: &quot;Voice Design&quot;, &quot;Voice Clone (Base)&quot; (selected), and &quot;TTS (CustomVoice)&quot;. The page is titled &quot;Clone Voice from Reference Audio&quot; and has two main sections. Left section: &quot;Reference Audio (Upload a voice sample clone)&quot; showing an audio waveform player at 0:00/0:34 with playback controls, upload and microphone icons, followed by &quot;Reference Text (Transcript of the reference audio)&quot; containing three paragraphs: &quot;Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net&quot;. Right section: &quot;Target Text (Text to synthesize with cloned voice)&quot; containing text about Qwen3-TTS speech generation capabilities, with &quot;Language&quot; dropdown set to &quot;Auto&quot; and &quot;Model Size&quot; dropdown set to &quot;1.7B&quot;, and a purple &quot;Clone &amp; Generate&quot; button at bottom." src="https://static.simonwillison.net/static/2026/qwen-voice-clone.jpg" /></p>
<p>I tried this out by recording myself reading <a href="https://simonwillison.net/about/">my about page</a> and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:</p>
<p><audio controls style="width: 100%">
<source src="https://static.simonwillison.net/static/2026/qwen-tts-clone.wav" type="audio/wav">
Your browser does not support the audio element.
</audio></p>
<p>It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.</p>
<p><strong>Update</strong>: Prince Canuma <a href="https://x.com/Prince_Canuma/status/2014453857019904423">got this working</a> with his <a href="https://pypi.org/project/mlx-audio/">mlx-audio</a> library. I <a href="https://claude.ai/share/2e01ad60-ca38-4e14-ab60-74eaa45b2fbd">had Claude</a> turn that into <a href="https://github.com/simonw/tools/blob/main/python/q3_tts.py">a CLI tool</a> which you can run with <code>uv</code> ike this:</p>
<pre><code>uv run https://tools.simonwillison.net/python/q3_tts.py \
'I am a pirate, give me your gold!' \
-i 'gruff voice' -o pirate.wav
</code></pre>
<p>The <code>-i</code> option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46719229">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/text-to-speech">text-to-speech</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/hugging-face">hugging-face</a>, <a href="https://simonwillison.net/tags/uv">uv</a>, <a href="https://simonwillison.net/tags/qwen">qwen</a>, <a href="https://simonwillison.net/tags/mlx">mlx</a>, <a href="https://simonwillison.net/tags/prince-canuma">prince-canuma</a>, <a href="https://simonwillison.net/tags/ai-in-china">ai-in-china</a></p>
</summary><category term="text-to-speech"/><category term="ai"/><category term="generative-ai"/><category term="hugging-face"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>Quoting Chris Lloyd</title><link href="https://simonwillison.net/2026/Jan/22/chris-lloyd/#atom-everything" rel="alternate"/><published>2026-01-22T15:34:22+00:00</published><updated>2026-01-22T15:34:22+00:00</updated><id>https://simonwillison.net/2026/Jan/22/chris-lloyd/#atom-everything</id><summary type="html">
<blockquote cite="https://news.ycombinator.com/item?id=46699072#46706040"><p>Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".</p>
<p>For each frame our pipeline constructs a scene graph with React then:</p>
<p>-&gt; layout elements<br>
-&gt; rasterize them to a 2d screen<br>
-&gt; diff that against the previous screen<br>
-&gt; <em>finally</em> use the diff to generate ANSI sequences to draw</p>
<p>We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.</p></blockquote>
<p class="cite">&mdash; <a href="https://news.ycombinator.com/item?id=46699072#46706040">Chris Lloyd</a>, Claude Code team at Anthropic</p>
<p>Tags: <a href="https://simonwillison.net/tags/react">react</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p>
</summary><category term="react"/><category term="claude-code"/></entry><entry><title>Claude's new constitution</title><link href="https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-everything" rel="alternate"/><published>2026-01-21T23:39:49+00:00</published><updated>2026-01-21T23:39:49+00:00</updated><id>https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.anthropic.com/news/claude-new-constitution">Claude&#x27;s new constitution</a></strong></p>
Late last year Richard Weiss <a href="https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document">found something interesting</a> while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was <em>not</em> part of the system prompt but appeared instead to be baked in during training, and which described Claude's core values at great length.</p>
<p>He called this leak the <strong>soul document</strong>, and Amanda Askell from Anthropic <a href="https://simonwillison.net/2025/Dec/2/claude-soul-document/">quickly confirmed</a> that it was indeed part of Claude's training procedures.</p>
<p>Today Anthropic made this official, <a href="https://www.anthropic.com/news/claude-new-constitution">releasing that full "constitution" document</a> under a CC0 (effectively public domain) license. There's a lot to absorb! It's over 35,000 tokens, more than 10x the length of the <a href="https://platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-5">published Opus 4.5 system prompt</a>.</p>
<p>One detail that caught my eye is the acknowledgements at the end, which include a list of <a href="https://www.anthropic.com/constitution#acknowledgements">external contributors</a> who helped review the document. I was intrigued to note that two of the fifteen listed names are Catholic members of the clergy - <a href="https://www.frbrendanmcguire.org/biography">Father Brendan McGuire</a> is a pastor in Los Altos with a Master’s degree in Computer Science and Math and <a href="https://en.wikipedia.org/wiki/Paul_Tighe">Bishop Paul Tighe</a> is an Irish Catholic bishop with a background in moral theology.
<p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/claude">claude</a>, <a href="https://simonwillison.net/tags/amanda-askell">amanda-askell</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/ai-personality">ai-personality</a></p>
</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="amanda-askell"/><category term="ai-ethics"/><category term="ai-personality"/></entry><entry><title>Electricity use of AI coding agents</title><link href="https://simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/#atom-everything" rel="alternate"/><published>2026-01-20T23:11:57+00:00</published><updated>2026-01-20T23:11:57+00:00</updated><id>https://simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.simonpcouch.com/blog/2026-01-20-cc-impact/">Electricity use of AI coding agents</a></strong></p>
Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT.</p>
<p>Simon P. Couch notes that coding agents such as Claude Code use <em>way</em> more tokens in response to tasks, often burning through many thousands of tokens of many tool calls.</p>
<p>As a heavy Claude Code user, Simon estimates his own usage at the equivalent of 4,400 "typical queries" to an LLM, for an equivalent of around $15-$20 in daily API token spend. He figures that to be about the same as running a dishwasher once or the daily energy used by a domestic refrigerator.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46695415">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/ai-energy-usage">ai-energy-usage</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p>
</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-ethics"/><category term="ai-energy-usage"/><category term="coding-agents"/><category term="claude-code"/></entry><entry><title>Giving University Exams in the Age of Chatbots</title><link href="https://simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/#atom-everything" rel="alternate"/><published>2026-01-20T17:51:17+00:00</published><updated>2026-01-20T17:51:17+00:00</updated><id>https://simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/#atom-everything</id><summary type="html">
<p><strong><a href="https://ploum.net/2026-01-19-exam-with-chatbots.html">Giving University Exams in the Age of Chatbots</a></strong></p>
Detailed and thoughtful description of an open-book and open-chatbot exam run by <a href="https://fr.wikipedia.org/wiki/Lionel_Dricot">Ploum</a> at École Polytechnique de Louvain for an "Open Source Strategies" class.</p>
<p>Students were told they could use chatbots during the exam but they had to announce their intention to do so in advance, share their prompts and take full accountability for any mistakes they made.</p>
<p>Only 3 out of 60 students chose to use chatbots. Ploum surveyed half of the class to help understand their motivations.
<p><small></small>Via <a href="https://lobste.rs/s/parmy3/giving_university_exams_age_chatbots">lobste.rs</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/education">education</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a></p>
</summary><category term="education"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-ethics"/></entry><entry><title>jordanhubbard/nanolang</title><link href="https://simonwillison.net/2026/Jan/19/nanolang/#atom-everything" rel="alternate"/><published>2026-01-19T23:58:56+00:00</published><updated>2026-01-19T23:58:56+00:00</updated><id>https://simonwillison.net/2026/Jan/19/nanolang/#atom-everything</id><summary type="html">
<p><strong><a href="https://github.com/jordanhubbard/nanolang">jordanhubbard/nanolang</a></strong></p>
Plenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (<a href="https://en.wikipedia.org/wiki/Jordan_Hubbard">co-founder of FreeBSD</a>, with serious stints at Apple and NVIDIA) just released exactly that.</p>
<blockquote>
<p>A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax.</p>
<p>NanoLang transpiles to C for native performance while providing a clean, modern syntax optimized for both human readability and AI code generation.</p>
</blockquote>
<p>The syntax strikes me as an interesting mix between C, Lisp and Rust.</p>
<p>I decided to see if an LLM could produce working code in it directly, given the necessary context. I started with this <a href="https://github.com/jordanhubbard/nanolang/blob/main/MEMORY.md">MEMORY.md</a> file, which begins:</p>
<blockquote>
<p><strong>Purpose:</strong> This file is designed specifically for Large Language Model consumption. It contains the essential knowledge needed to generate, debug, and understand NanoLang code. Pair this with <code>spec.json</code> for complete language coverage.</p>
</blockquote>
<p>I ran that using <a href="https://llm.datasette.io/">LLM</a> and <a href="https://github.com/simonw/llm-anthropic">llm-anthropic</a> like this:</p>
<pre><code>llm -m claude-opus-4.5 \
-s https://raw.githubusercontent.com/jordanhubbard/nanolang/refs/heads/main/MEMORY.md \
'Build me a mandelbrot fractal CLI tool in this language'
&gt; /tmp/fractal.nano
</code></pre>
<p>The <a href="https://gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8#mandelbrot-fractal-cli-tool-in-nano">resulting code</a>... <a href="https://gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8?permalink_comment_id=5947465#gistcomment-5947465">did not compile</a>.</p>
<p>I may have been too optimistic expecting a one-shot working program for a new language like this. So I ran a clone of the actual project, copied in my program and had Claude Code take a look at the failing compiler output.</p>
<p>... and it worked! Claude happily grepped its way through the various <code>examples/</code> and built me a working program.</p>
<p>Here's <a href="https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/index.html">the Claude Code transcript</a> - you can see it <a href="https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/page-001.html#msg-2026-01-19T23-43-09-675Z">reading relevant examples here</a> - and here's <a href="https://gist.github.com/simonw/e7f3577adcfd392ab7fa23b1295d00f2">the finished code plus its output</a>.</p>
<p>I've suspected <a href="https://simonwillison.net/2025/Nov/7/llms-for-new-programming-languages/">for a while</a> that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46684958">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/programming-languages">programming-languages</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/llm">llm</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p>
</summary><category term="programming-languages"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="llm"/><category term="coding-agents"/><category term="claude-code"/></entry><entry><title>Scaling long-running autonomous coding</title><link href="https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/#atom-everything" rel="alternate"/><published>2026-01-19T05:12:51+00:00</published><updated>2026-01-19T05:12:51+00:00</updated><id>https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/#atom-everything</id><summary type="html">
<p><strong><a href="https://cursor.com/blog/scaling-agents">Scaling long-running autonomous coding</a></strong></p>
Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents:</p>
<blockquote>
<p>This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.</p>
</blockquote>
<p>They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks - similar to how Claude Code uses sub-agents. Each cycle ended with a judge agent deciding if the project was completed or not.</p>
<p>In my predictions for 2026 <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise">the other day</a> I said that by 2029:</p>
<blockquote>
<p>I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising. Rolling a new web browser is one of the most complicated software projects I can imagine[...] the cheat code is the conformance suites. If there are existing tests that it’ll get so much easier.</p>
</blockquote>
<p>I may have been off by three years, because Cursor chose "building a web browser from scratch" as their test case for their agent swarm approach:</p>
<blockquote>
<p>To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore <a href="https://github.com/wilsonzlin/fastrender">the source code on GitHub</a>.</p>
</blockquote>
<p>But how well did they do? Their initial announcement a couple of days ago was met with <a href="https://embedding-shapes.github.io/cursor-implied-success-without-evidence/">unsurprising skepticism</a>, especially when it became apparent that their GitHub Actions CI was failing and there were no build instructions in the repo.</p>
<p>It looks like they addressed that within the past 24 hours. The <a href="https://github.com/wilsonzlin/fastrender/blob/main/README.md#build-requirements">latest README</a> includes build instructions which I followed on macOS like this:</p>
<pre><code>cd /tmp
git clone https://github.com/wilsonzlin/fastrender
cd fastrender
git submodule update --init vendor/ecma-rs
cargo run --release --features browser_ui --bin browser
</code></pre>
<p>This got me a working browser window! Here are screenshots I took of google.com and my own website:</p>
<p><img alt="The browser chrome is neat but has a garbled tab name at the top. The Google homepage looks mostly correct but the buttons are not styled correctly and the Google Search one has a huge plus icon floating near it." src="https://static.simonwillison.net/static/2026/cursor-google.png" /></p>
<p><img alt="My blog looks mostly correct, but the right closing quotation mark on a quotation (which is implemented as a background image on the final paragraph) is displayed incorrectly multiple times." src="https://static.simonwillison.net/static/2026/cursor-simonwillison.jpg" /></p>
<p>Honestly those are very impressive! You can tell they're not just wrapping an existing rendering engine because of those very obvious rendering glitches, but the pages are legible and look mostly correct.</p>
<p>The FastRender repo even uses Git submodules <a href="https://github.com/wilsonzlin/fastrender/tree/main/specs">to include various WhatWG and CSS-WG specifications</a> in the repo, which is a smart way to make sure the agents have access to the reference materials that they might need.</p>
<p>This is the second attempt I've seen at building a full web browser using AI-assisted coding in the past two weeks - the first was <a href="https://github.com/hiwavebrowser/hiwave">HiWave browser</a>, a new browser engine in Rust first announced <a href="https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_christmas_break_i_wrote_a_fully_functional/">in this Reddit thread</a>.</p>
<p>When I made my 2029 prediction this is more-or-less the quality of result I had in mind. I don't think we'll see projects of this nature compete with Chrome or Firefox or WebKit any time soon but I have to admit I'm very surprised to see something this capable emerge so quickly.</p>
<p><strong>Update 23rd January 2026</strong>: I recorded a 47 minute conversation with Wilson about this project and published it on YouTube. Here's <a href="https://simonwillison.net/2026/Jan/23/fastrender/">the video and accompanying highlights</a>.
<p>Tags: <a href="https://simonwillison.net/tags/browsers">browsers</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/cursor">cursor</a>, <a href="https://simonwillison.net/tags/parallel-agents">parallel-agents</a>, <a href="https://simonwillison.net/tags/conformance-suites">conformance-suites</a>, <a href="https://simonwillison.net/tags/browser-challenge">browser-challenge</a></p>
</summary><category term="browsers"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="cursor"/><category term="parallel-agents"/><category term="conformance-suites"/><category term="browser-challenge"/></entry><entry><title>FLUX.2-klein-4B Pure C Implementation</title><link href="https://simonwillison.net/2026/Jan/18/flux2-klein-4b/#atom-everything" rel="alternate"/><published>2026-01-18T23:58:58+00:00</published><updated>2026-01-18T23:58:58+00:00</updated><id>https://simonwillison.net/2026/Jan/18/flux2-klein-4b/#atom-everything</id><summary type="html">
<p><strong><a href="https://github.com/antirez/flux2.c">FLUX.2-klein-4B Pure C Implementation</a></strong></p>
On 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released <a href="https://huggingface.co/black-forest-labs/FLUX.2-klein-4B">black-forest-labs/FLUX.2-klein-4B</a> - an Apache 2.0 licensed 4 billion parameter version of their FLUX.2 family.</p>
<p>Salvatore Sanfilippo (antirez) decided to build a pure C and dependency-free implementation to run the model, with assistance from Claude Code and Claude Opus 4.5.</p>
<p>Salvatore shared <a href="https://news.ycombinator.com/item?id=46670279#46671233">this note</a> on Hacker News:</p>
<blockquote>
<p>Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it <em>needed</em> to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.</p>
</blockquote>
<p>Here's that <a href="https://github.com/antirez/flux2.c/blob/main/IMPLEMENTATION_NOTES.md">IMPLEMENTATION_NOTES.md</a> file.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46670279">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/c">c</a>, <a href="https://simonwillison.net/tags/salvatore-sanfilippo">salvatore-sanfilippo</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/stable-diffusion">stable-diffusion</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/text-to-image">text-to-image</a>, <a href="https://simonwillison.net/tags/coding-agents">coding-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a></p>
</summary><category term="c"/><category term="salvatore-sanfilippo"/><category term="ai"/><category term="stable-diffusion"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="text-to-image"/><category term="coding-agents"/><category term="claude-code"/></entry><entry><title>Quoting Jeremy Daer</title><link href="https://simonwillison.net/2026/Jan/17/jeremy-daer/#atom-everything" rel="alternate"/><published>2026-01-17T17:06:41+00:00</published><updated>2026-01-17T17:06:41+00:00</updated><id>https://simonwillison.net/2026/Jan/17/jeremy-daer/#atom-everything</id><summary type="html">
<blockquote cite="https://twitter.com/dhh/status/2012543705161326941"><p><em>[On agents using CLI tools in place of REST APIs]</em> To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizing authentication failures.</p>
<p>Other major factor: which models can wield the skill? Using the CLI lowers the bar so cheap, fast models (gpt-5-nano, haiku-4.5) can reliably succeed. Using the raw APl is something only the costly "strong" models (gpt-5.2, opus-4.5) can manage, and it squeezes a ton of thinking/reasoning out of them, which means multiple turns/iterations, which means accumulating a ton of context, which means burning loads of expensive tokens. For one-off API requests and ad hoc usage driven by a developer, this is reasonable and even helpful, but for an autonomous agent doing repetitive work, it's a disaster.</p></blockquote>
<p class="cite">&mdash; <a href="https://twitter.com/dhh/status/2012543705161326941">Jeremy Daer</a>, 37signals</p>
<p>Tags: <a href="https://simonwillison.net/tags/prompt-engineering">prompt-engineering</a>, <a href="https://simonwillison.net/tags/skills">skills</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/37-signals">37-signals</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p>
</summary><category term="prompt-engineering"/><category term="skills"/><category term="generative-ai"/><category term="37-signals"/><category term="ai"/><category term="llms"/></entry><entry><title>Our approach to advertising and expanding access to ChatGPT</title><link href="https://simonwillison.net/2026/Jan/16/chatgpt-ads/#atom-everything" rel="alternate"/><published>2026-01-16T21:28:26+00:00</published><updated>2026-01-16T21:28:26+00:00</updated><id>https://simonwillison.net/2026/Jan/16/chatgpt-ads/#atom-everything</id><summary type="html">
<p><strong><a href="https://openai.com/index/our-approach-to-advertising-and-expanding-access/">Our approach to advertising and expanding access to ChatGPT</a></strong></p>
OpenAI's long-rumored introduction of ads to ChatGPT just became a whole lot more concrete:</p>
<blockquote>
<p>In the coming weeks, we’re also planning to start testing ads in the U.S. for the free and Go tiers, so more people can benefit from our tools with fewer usage limits or without having to pay. Plus, Pro, Business, and Enterprise subscriptions will not include ads.</p>
</blockquote>
<p>What's "Go" tier, you might ask? That's a new $8/month tier that launched today in the USA, see <a href="https://openai.com/index/introducing-chatgpt-go/">Introducing ChatGPT Go, now available worldwide</a>. It's a tier that they first trialed in India in August 2025 (here's a mention <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes#h_22cae6eb9f">in their release notes from August</a> listing a price of ₹399/month, which converts to around $4.40).</p>
<p>I'm finding the new plan comparison grid on <a href="https://chatgpt.com/pricing">chatgpt.com/pricing</a> pretty confusing. It lists all accounts as having access to GPT-5.2 Thinking, but doesn't clarify the limits that the free and Go plans have to conform to. It also lists different context windows for the different plans - 16K for free, 32K for Go and Plus and 128K for Pro. I had assumed that the 400,000 token window <a href="https://platform.openai.com/docs/models/gpt-5.2">on the GPT-5.2 model page</a> applied to ChatGPT as well, but apparently I was mistaken.</p>
<p><strong>Update</strong>: I've apparently not been paying attention: here's the Internet Archive ChatGPT pricing page from <a href="https://web.archive.org/web/20250906071408/https://chatgpt.com/pricing">September 2025</a> showing those context limit differences as well.</p>
<p>Back to advertising: my biggest concern has always been whether ads will influence the output of the chat directly. OpenAI assure us that they will not:</p>
<blockquote>
<ul>
<li><strong>Answer independence</strong>: Ads do not influence the answers ChatGPT gives you. Answers are optimized based on what's most helpful to you. Ads are always separate and clearly labeled.</li>
<li><strong>Conversation privacy</strong>: We keep your conversations with ChatGPT private from advertisers, and we never sell your data to advertisers.</li>
</ul>
</blockquote>
<p>So what will they look like then? This screenshot from the announcement offers a useful hint:</p>
<p><img alt="Two iPhone screenshots showing ChatGPT mobile app interface. Left screen displays a conversation about Santa Fe, New Mexico with an image of adobe-style buildings and desert landscape, text reading &quot;Santa Fe, New Mexico—often called 'The City Different'—is a captivating blend of history, art, and natural beauty at the foot of the Sangre de Cristo Mountains. As the oldest and highest-elevation state capital in the U.S., founded in 1610, it offers a unique mix of Native American, Spanish, and Anglo cultures.&quot; Below is a sponsored section from &quot;Pueblo &amp; Pine&quot; showing &quot;Desert Cottages - Expansive residences with desert vistas&quot; with a thumbnail image, and a &quot;Chat with Pueblo &amp; Pine&quot; button. Input field shows &quot;Ask ChatGPT&quot;. Right screen shows the Pueblo &amp; Pine chat interface with the same Desert Cottages listing and an AI response &quot;If you're planning a trip to Sante Fe, I'm happy to help. When are you thinking of going?&quot; with input field &quot;Ask Pueblo &amp; Pine&quot; and iOS keyboard visible." src="https://static.simonwillison.net/static/2026/chatgpt-ads.jpg" /></p>
<p>The user asks about trips to Santa Fe, and an ad shows up for a cottage rental business there. This particular example imagines an option to start a direct chat with a bot aligned with that advertiser, at which point presumably the advertiser can influence the answers all they like!
<p>Tags: <a href="https://simonwillison.net/tags/ads">ads</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/chatgpt">chatgpt</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p>
</summary><category term="ads"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/></entry><entry><title>Open Responses</title><link href="https://simonwillison.net/2026/Jan/15/open-responses/#atom-everything" rel="alternate"/><published>2026-01-15T23:56:56+00:00</published><updated>2026-01-15T23:56:56+00:00</updated><id>https://simonwillison.net/2026/Jan/15/open-responses/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.openresponses.org/">Open Responses</a></strong></p>
This is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs.</p>
<p>Open Responses aims to provide exactly that as a documented standard, derived from OpenAI's Responses API.</p>
<p>I was hoping for one based on their older Chat Completions API since so many other products have cloned the already, but basing it on Responses does make sense since that API was designed with the feature of more recent models - such as reasoning traces - baked into the design.</p>
<p>What's certainly notable is the list of launch partners. OpenRouter alone means we can expect to be able to use this protocol with almost every existing model, and Hugging Face, LM Studio, vLLM, Ollama and Vercel cover a huge portion of the common tools used to serve models.</p>
<p>For protocols like this I really want to see a comprehensive, language-independent conformance test site. Open Responses has a subset of that - the official repository includes <a href="https://github.com/openresponses/openresponses/blob/d0f23437b27845d5c3d0abaf5cb5c4a702f26b05/src/lib/compliance-tests.ts">src/lib/compliance-tests.ts</a> which can be used to exercise a server implementation, and is available as a React app <a href="https://www.openresponses.org/compliance">on the official site</a> that can be pointed at any implementation served via CORS.</p>
<p>What's missing is the equivalent for clients. I plan to spin up my own client library for this in Python and I'd really like to be able to run that against a conformance suite designed to check that my client correctly handles all of the details.
<p><small></small>Via <a href="https://twitter.com/reach_vb/status/2011863516852965565">VB</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/json">json</a>, <a href="https://simonwillison.net/tags/standards">standards</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/openrouter">openrouter</a>, <a href="https://simonwillison.net/tags/conformance-suites">conformance-suites</a></p>
</summary><category term="json"/><category term="standards"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="openrouter"/><category term="conformance-suites"/></entry><entry><title>The Design & Implementation of Sprites</title><link href="https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-everything" rel="alternate"/><published>2026-01-15T16:08:27+00:00</published><updated>2026-01-15T16:08:27+00:00</updated><id>https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-everything</id><summary type="html">
<p><strong><a href="https://fly.io/blog/design-and-implementation/">The Design &amp; Implementation of Sprites</a></strong></p>
I <a href="https://simonwillison.net/2026/Jan/9/sprites-dev/">wrote about Sprites last week</a>. Here's Thomas Ptacek from Fly with the insider details on how they work under the hood.</p>
<p>I like this framing of them as "disposable computers":</p>
<blockquote>
<p>Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged it so you’re never more than a second or two away from having a Sprite to do it with.</p>
</blockquote>
<p>I've noticed that new Fly Machines can take a while (up to around a minute) to provision. Sprites solve that by keeping warm pools of unused machines in multiple regions, which is enabled by them all using the same container:</p>
<blockquote>
<p>Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it’s easy for us to keep pools of “empty” Sprites standing by. The result: a Sprite create doesn’t have any heavy lifting to do; it’s basically just doing the stuff we do when we start a Fly Machine.</p>
</blockquote>
<p>The most interesting detail is how the persistence layer works. Sprites only charge you for data you have written that differs from the base image and provide ~300ms checkpointing and restores - it turns out that's power by a custom filesystem on top of S3-compatible storage coordinated by Litestream-replicated local SQLite metadata:</p>
<blockquote>
<p>We still exploit NVMe, but not as the root of storage. Instead, it’s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words “Sprites are backed by object storage.” [...]</p>
<p>The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is <a href="https://litestream.io">kept durable with Litestream</a>. Nothing depends on local storage.</p>
</blockquote>
<p><small></small>Via <a href="https://twitter.com/tqbf/status/2011823480673624434">@tqbf</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/architecture">architecture</a>, <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/sqlite">sqlite</a>, <a href="https://simonwillison.net/tags/thomas-ptacek">thomas-ptacek</a>, <a href="https://simonwillison.net/tags/fly">fly</a>, <a href="https://simonwillison.net/tags/litestream">litestream</a></p>
</summary><category term="architecture"/><category term="sandboxing"/><category term="sqlite"/><category term="thomas-ptacek"/><category term="fly"/><category term="litestream"/></entry><entry><title>Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar</title><link href="https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/#atom-everything" rel="alternate"/><published>2026-01-15T00:56:27+00:00</published><updated>2026-01-15T00:56:27+00:00</updated><id>https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/#atom-everything</id><summary type="html">
<blockquote cite="https://alignment.openai.com/confessions/"><p>When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks good” to it (because coming up with an answer that is actually good can be hard). The philosophy behind confessions is that we can train models to produce a second output — aka a “confession” — that is rewarded solely for honesty, which we will argue is less likely hacked than the normal task reward function. One way to think of confessions is that we are giving the model access to an “anonymous tip line” where it can turn itself in by presenting incriminating evidence of misbehavior. But unlike real-world tip lines, if the model acted badly in the original task, it can collect the reward for turning itself in while still keeping the original reward from the bad behavior in the main task. We hypothesize that this form of training will teach models to produce maximally honest confessions.</p></blockquote>
<p class="cite">&mdash; <a href="https://alignment.openai.com/confessions/">Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar</a>, OpenAI: Why we are excited about confessions</p>
<p>Tags: <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a></p>
</summary><category term="openai"/><category term="llms"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>Claude Cowork Exfiltrates Files</title><link href="https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-everything" rel="alternate"/><published>2026-01-14T22:15:22+00:00</published><updated>2026-01-14T22:15:22+00:00</updated><id>https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files">Claude Cowork Exfiltrates Files</a></strong></p>
Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data.</p>
<p>Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they constructed an attack that includes an attacker's own Anthropic API key and has the agent upload any files it can see to the <code>https://api.anthropic.com/v1/files</code> endpoint, allowing the attacker to retrieve their content later.
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46622328">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/security">security</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/prompt-injection">prompt-injection</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/exfiltration-attacks">exfiltration-attacks</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/lethal-trifecta">lethal-trifecta</a>, <a href="https://simonwillison.net/tags/claude-cowork">claude-cowork</a></p>
</summary><category term="security"/><category term="ai"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="exfiltration-attacks"/><category term="ai-agents"/><category term="claude-code"/><category term="lethal-trifecta"/><category term="claude-cowork"/></entry><entry><title>Anthropic invests $1.5 million in the Python Software Foundation and open source security</title><link href="https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-everything" rel="alternate"/><published>2026-01-13T23:58:17+00:00</published><updated>2026-01-13T23:58:17+00:00</updated><id>https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-everything</id><summary type="html">
<p><strong><a href="https://pyfound.blogspot.com/2025/12/anthropic-invests-in-python.html?m=1">Anthropic invests $1.5 million in the Python Software Foundation and open source security</a></strong></p>
This is outstanding news, especially given our decision to withdraw from that NSF grant application <a href="https://simonwillison.net/2025/Oct/27/psf-withdrawn-proposal/">back in October</a>.</p>
<blockquote>
<p>We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation’s work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation’s core work supporting the Python language, ecosystem, and global community.</p>
</blockquote>
<p>Note that while security is a focus these funds will also support other aspects of the PSF's work:</p>
<blockquote>
<p>Anthropic’s support will also go towards the PSF’s core work, including the Developer in Residence program driving contributions to CPython, community support through grants and other programs, running core infrastructure such as PyPI, and more.</p>
</blockquote>
<p>Tags: <a href="https://simonwillison.net/tags/open-source">open-source</a>, <a href="https://simonwillison.net/tags/python">python</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/psf">psf</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a></p>
</summary><category term="open-source"/><category term="python"/><category term="ai"/><category term="psf"/><category term="anthropic"/></entry><entry><title>Superhuman AI Exfiltrates Emails</title><link href="https://simonwillison.net/2026/Jan/12/superhuman-ai-exfiltrates-emails/#atom-everything" rel="alternate"/><published>2026-01-12T22:24:54+00:00</published><updated>2026-01-12T22:24:54+00:00</updated><id>https://simonwillison.net/2026/Jan/12/superhuman-ai-exfiltrates-emails/#atom-everything</id><summary type="html">
<p><strong><a href="https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails">Superhuman AI Exfiltrates Emails</a></strong></p>
Classic prompt injection attack:</p>
<blockquote>
<p>When asked to summarize the user’s recent mail, a prompt injection in an untrusted email manipulated Superhuman AI to submit content from dozens of other sensitive emails (including financial, legal, and medical information) in the user’s inbox to an attacker’s Google Form.</p>
</blockquote>
<p>To Superhuman's credit they treated this as the high priority incident it is and issued a fix.</p>
<p>The root cause was a CSP rule that allowed markdown images to be loaded from <code>docs.google.com</code> - it turns out Google Forms on that domain will persist data fed to them via a GET request!
<p><small></small>Via <a href="https://news.ycombinator.com/item?id=46592424">Hacker News</a></small></p>
<p>Tags: <a href="https://simonwillison.net/tags/security">security</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/prompt-injection">prompt-injection</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/exfiltration-attacks">exfiltration-attacks</a>, <a href="https://simonwillison.net/tags/content-security-policy">content-security-policy</a></p>
</summary><category term="security"/><category term="ai"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/><category term="exfiltration-attacks"/><category term="content-security-policy"/></entry><entry><title>First impressions of Claude Cowork, Anthropic's general agent</title><link href="https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-everything" rel="alternate"/><published>2026-01-12T21:46:13+00:00</published><updated>2026-01-12T21:46:13+00:00</updated><id>https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-everything</id><summary type="html">
<p>New from Anthropic today is <a href="https://claude.com/blog/cowork-research-preview">Claude Cowork</a>, a "research preview" that they describe as "Claude Code for the rest of your work". It's currently available only to Max subscribers ($100 or $200 per month plans) as part of the updated Claude Desktop macOS application. <strong>Update 16th January 2026</strong>: it's now also available to $20/month Claude Pro subscribers.</p>
<p>I've been saying for a while now that Claude Code is a "general agent" disguised as a developer tool. It can help you with any computer task that can be achieved by executing code or running terminal commands... which covers almost anything, provided you know what you're doing with it! What it really needs is a UI that doesn't involve the terminal and a name that doesn't scare away non-developers.</p>
<p>"Cowork" is a pretty solid choice on the name front!</p>
<h4 id="what-it-looks-like">What it looks like</h4>
<p>The interface for Cowork is a new tab in the Claude desktop app, called Cowork. It sits next to the existing Chat and Code tabs.</p>
<p>It looks very similar to the desktop interface for regular Claude Code. You start with a prompt, optionally attaching a folder of files. It then starts work.</p>
<p>I tried it out against my perpetually growing "blog-drafts" folder with the following prompt:</p>
<blockquote>
<p>Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready</p>
</blockquote>
<p><img src="https://static.simonwillison.net/static/2026/claude-cowork.jpg" alt="Screenshot of Claude AI desktop application showing a &quot;Cowork&quot; task interface. Left sidebar shows tabs for &quot;Chat&quot;, &quot;Code&quot;, and &quot;Cowork&quot; (selected), with &quot;+ New task&quot; button and a task titled &quot;Review unpublished drafts for pu...&quot; listed below. Text reads &quot;These tasks run locally and aren't synced across devices&quot;. Main panel header shows &quot;Review unpublished drafts for publication&quot;. User message in green bubble reads: &quot;Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready&quot;. Claude responds: &quot;I'll help you find drafts from the last three months and check if they've been published. Let me start by looking at your drafts folder.&quot; Below is an expanded &quot;Running command&quot; section showing Request JSON with command: find /sessions/zealous-bold-ramanujan/mnt/blog-drafts -type f \\( -name \&quot;*.md\&quot; -o -name \&quot;*.txt\&quot; -o -name \&quot;*.html\&quot; \\) -mtime -90 -exec ls -la {} \\;, description: Find draft files modified in the last 90 days. Response text begins: &quot;Found 46 draft files. Next let me read the content of each to get their titles/topics, then&quot;. Right sidebar shows Progress section with three circular indicators (two checked, one pending) and text &quot;Steps will show as the task unfolds.&quot;, Artifacts section listing &quot;publish-encouragement.html&quot;, Context section with &quot;Selected folders&quot; showing &quot;blog-drafts&quot; folder, Connectors showing &quot;Web search&quot;, and Working files listing &quot;llm-digest-october-2025.md&quot;, &quot;tests-not-optional-coding-agen...&quot;, and &quot;digest-november-2025.md&quot;. Bottom shows reply input field, &quot;Opus 4.5&quot; model selector, user &quot;Simon Willison&quot; with &quot;Max plan&quot;, and disclaimer &quot;Claude is AI and can make mistakes. Please double-check responses.&quot;" style="max-width: 100%;" /></p>
<p>It started by running this command:</p>
<div class="highlight highlight-source-shell"><pre>find /sessions/zealous-bold-ramanujan/mnt/blog-drafts \
-type f <span class="pl-cce">\(</span> -name <span class="pl-s"><span class="pl-pds">"</span>*.md<span class="pl-pds">"</span></span> -o -name <span class="pl-s"><span class="pl-pds">"</span>*.txt<span class="pl-pds">"</span></span> -o -name <span class="pl-s"><span class="pl-pds">"</span>*.html<span class="pl-pds">"</span></span> <span class="pl-cce">\)</span> \
-mtime -90 -exec ls -la {} <span class="pl-cce">\;</span></pre></div>
<p>That <code>/sessions/zealous-bold-ramanujan/mnt/blog-drafts</code> path instantly caught my eye. Anthropic say that Cowork can only access files you grant it access to - it looks to me like they're mounting those files into a containerized environment, which should mean we can trust Cowork not to be able to access anything outside of that sandbox.</p>
<p>It turns out I have 46 draft files from the past three months. Claude then went to work with its search tool, running 44 individual searches against <code>site:simonwillison.net</code> to figure out which of my drafts had already been published.</p>
<p>Here's the eventual reply:</p>
<blockquote>
<p>Based on my analysis, here are <strong>your unpublished drafts that appear closest to being ready for publication</strong>:</p>
<hr />
<p><strong>🔥 Most Ready to Publish (substantial content, not yet published)</strong></p>
<ol>
<li>
<strong><code>freq-argued-llms.md</code></strong> - "Frequently Argued Questions about LLMs" (22,602 bytes)
<ul>
<li>This is a meaty piece documenting common arguments about LLMs with your counterpoints</li>
<li>Well-structured with a TL;DR and multiple sections</li>
<li>No matching published article found on your site</li>
<li>
<strong>Very close to ready</strong> - just needs a final review pass</li>
</ul>
</li>
<li>
<strong><code>claude-code-timeline.md</code></strong> - "Claude Code Timeline and Codex Timeline" (3,075 bytes)
<ul>
<li>About viewing JSONL session logs from Claude Code and Codex</li>
<li>You published <code>claude-code-transcripts</code> on Dec 25, but this appears to be a different/earlier piece about timeline viewing tools</li>
<li>Shorter but seems complete</li>
</ul>
</li>
<li>
<strong><code>datasette-1a20-upgrade-instructions.md</code></strong> - Plugin Upgrade Guide (3,147 bytes)
<ul>
<li>Technical guide for plugin authors</li>
<li>You published the main 1.0a20 announcement but this companion upgrade guide appears unpublished</li>
<li>Would be valuable for plugin maintainers</li>
</ul>
</li>
</ol>
</blockquote>
<p>That's a good response! It found exactly what I needed to see, although those upgrade instructions are actually published elsewhere now (<a href="https://docs.datasette.io/en/latest/upgrade_guide.html">in the Datasette docs</a>) and weren't actually intended for my blog.</p>
<p>Just for fun, and because I <a href="https://simonwillison.net/2024/Oct/21/claude-artifacts/">really like artifacts</a>, I asked for a follow-up:</p>
<blockquote>
<p>Make me an artifact with exciting animated encouragements to get me to do it</p>
</blockquote>
<p>Here's what I got:</p>
<p><img src="https://static.simonwillison.net/static/2026/claude-cowork-artifact.jpg" alt="Screenshot of the same Claude AI desktop application Cowork interface, now showing completed task results. Left panel shows &quot;1 step &gt;&quot; with link &quot;View your animated encouragement page&quot;. Claude's response reads: &quot;I created an over-the-top animated encouragement page with:&quot; followed by bullet points: &quot;🚀 Pulsing rockets and bouncing stats&quot;, &quot;✨ Falling emoji rain and confetti&quot;, &quot;🔥 Dancing fire emojis around your draft title&quot;, &quot;💫 Sparkles that follow your mouse&quot;, &quot;📊 An animated '95% ready' progress bar&quot;, &quot;💬 Rotating motivational quotes&quot;, &quot;🎉 A 'I'M GONNA DO IT!' button that triggers an explosion of confetti when clicked&quot;. Center shows an artifact preview of the generated HTML page with dark background featuring animated rocket emojis, large white text &quot;PUBLISH TIME!&quot;, stats showing &quot;22,602 bytes of wisdom waiting&quot;, &quot;95% ready to ship&quot;, infinity symbol with &quot;future arguments saved&quot;, and a fire emoji with yellow text &quot;Frequently&quot; (partially visible). Top toolbar shows &quot;Open in Firefox&quot; button. Right sidebar displays Progress section with checkmarks, Artifacts section with &quot;publish-encouragement.html&quot; selected, Context section showing &quot;blog-drafts&quot; folder, &quot;Web search&quot; connector, and Working files listing &quot;llm-digest-october-2025.md&quot;, &quot;tests-not-optional-coding-agen...&quot;, and &quot;digest-november-2025.md&quot;. Bottom shows reply input, &quot;Opus 4.5&quot; model selector, and disclaimer text." style="max-width: 100%;" /></p>
<p>I couldn't figure out how to close the right sidebar so the artifact ended up cramped into a thin column but it did work. I expect Anthropic will fix that display bug pretty quickly.</p>
<h4 id="isn-t-this-just-claude-code-">Isn't this just Claude Code?</h4>
<p>I've seen a few people ask what the difference between this and regular Claude Code is. The answer is <em>not a lot</em>. As far as I can tell Claude Cowork is regular Claude Code wrapped in a less intimidating default interface and with a filesystem sandbox configured for you without you needing to know what a "filesystem sandbox" is.</p>
<p><strong>Update</strong>: It's more than just a filesystem sandbox - I had Claude Code reverse engineer the Claude app and <a href="https://gist.github.com/simonw/35732f187edbe4fbd0bf976d013f22c8">it found out</a> that Claude uses VZVirtualMachine - the Apple Virtualization Framework - and downloads and boots a custom Linux root filesystem.</p>
<p>I think that's a really smart product. Claude Code has an enormous amount of value that hasn't yet been unlocked for a general audience, and this seems like a pragmatic approach.</p>
<h4 id="the-ever-present-threat-of-prompt-injection">The ever-present threat of prompt injection</h4>
<p>With a feature like this, my first thought always jumps straight to security. How big is the risk that someone using this might be hit by hidden malicious instruction somewhere that break their computer or steal their data?</p>
<p>Anthropic touch on that directly in the announcement:</p>
<blockquote>
<p>You should also be aware of the risk of "<a href="https://www.anthropic.com/research/prompt-injection-defenses">prompt injections</a>": attempts by attackers to alter Claude's plans through content it might encounter on the internet. We've built sophisticated defenses against prompt injections, but agent safety---that is, the task of securing Claude's real-world actions---is still an active area of development in the industry.</p>
<p>These risks aren't new with Cowork, but it might be the first time you're using a more advanced tool that moves beyond a simple conversation. We recommend taking precautions, particularly while you learn how it works. We provide more detail in our <a href="https://support.claude.com/en/articles/13364135-using-cowork-safely">Help Center</a>.</p>
</blockquote>
<p>That help page includes the following tips:</p>
<blockquote>
<p>To minimize risks:</p>
<ul>
<li>Avoid granting access to local files with sensitive information, like financial documents.</li>
<li>When using the Claude in Chrome extension, limit access to trusted sites.</li>
<li>If you chose to extend Claude’s default internet access settings, be careful to only extend internet access to sites you trust.</li>
<li>Monitor Claude for suspicious actions that may indicate prompt injection.</li>
</ul>
</blockquote>
<p>I do not think it is fair to tell regular non-programmer users to watch out for "suspicious actions that may indicate prompt injection"!</p>
<p>I'm sure they have some impressive mitigations going on behind the scenes. I recently learned that the summarization applied by the WebFetch function in Claude Code and now in Cowork is partly intended as a prompt injection protection layer via <a href="https://x.com/bcherny/status/1989025306980860226">this tweet</a> from Claude Code creator Boris Cherny:</p>
<blockquote>
<p>Summarization is one thing we do to reduce prompt injection risk. Are you running into specific issues with it?</p>
</blockquote>
<p>But Anthropic are being honest here with their warnings: they can attempt to filter out potential attacks all they like but the one thing they can't provide is guarantees that no future attack will be found that sneaks through their defenses and steals your data (see <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/">the lethal trifecta</a> for more on this.)</p>
<p>The problem with prompt injection remains that until there's a high profile incident it's really hard to get people to take it seriously. I myself have all sorts of Claude Code usage that could cause havoc if a malicious injection got in. Cowork does at least run in a filesystem sandbox by default, which is more than can be said for my <code>claude --dangerously-skip-permissions</code> habit!</p>
<p>I wrote more about this in my 2025 round-up: <a href="https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-year-of-yolo-and-the-normalization-of-deviance">The year of YOLO and the Normalization of Deviance</a>.</p>
<h4 id="this-is-still-a-strong-signal-of-the-future">This is still a strong signal of the future</h4>
<p>Security worries aside, Cowork represents something really interesting. This is a general agent that looks well positioned to bring the wildly powerful capabilities of Claude Code to a wider audience.</p>
<p>I would be very surprised if Gemini and OpenAI don't follow suit with their own offerings in this category.</p>
<p>I imagine OpenAI are already regretting burning the name "ChatGPT Agent" on their janky, experimental and mostly forgotten browser automation tool <a href="https://simonwillison.net/2025/Aug/4/chatgpt-agents-user-agent/">back in August</a>!</p>
<h4 id="bonus-and-a-silly-logo">Bonus: and a silly logo</h4>
<p>bashtoni <a href="https://news.ycombinator.com/item?id=46593022#46593553">on Hacker News</a>:</p>
<blockquote>
<p>Simple suggestion: logo should be a cow and and orc to match how I originally read the product name.</p>
</blockquote>
<p>I couldn't resist <a href="https://gist.github.com/simonw/d06dec3d62dee28f2bd993eb78beb2ce">throwing that one at Nano Banana</a>:</p>
<p><img src="https://static.simonwillison.net/static/2026/cow-ork.jpg" alt="An anthropic style logo with a cow and an ork on it" style="max-width: 100%;" /></p>
<p>Tags: <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/prompt-injection">prompt-injection</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/claude">claude</a>, <a href="https://simonwillison.net/tags/ai-agents">ai-agents</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/lethal-trifecta">lethal-trifecta</a>, <a href="https://simonwillison.net/tags/claude-cowork">claude-cowork</a></p>
</summary><category term="sandboxing"/><category term="ai"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-agents"/><category term="claude-code"/><category term="lethal-trifecta"/><category term="claude-cowork"/></entry><entry><title>Don't fall into the anti-AI hype</title><link href="https://simonwillison.net/2026/Jan/11/dont-fall-into-the-anti-ai-hype/#atom-everything" rel="alternate"/><published>2026-01-11T23:58:43+00:00</published><updated>2026-01-11T23:58:43+00:00</updated><id>https://simonwillison.net/2026/Jan/11/dont-fall-into-the-anti-ai-hype/#atom-everything</id><summary type="html">
<p><strong><a href="https://antirez.com/news/158">Don&#x27;t fall into the anti-AI hype</a></strong></p>
I'm glad someone was brave enough to say this. There is a <em>lot</em> of anti-AI sentiment in the software development community these days. Much of it is justified, but if you let people convince you that AI isn't genuinely useful for software developers or that this whole thing will blow over soon it's becoming clear that you're taking on a very real risk to your future career.</p>
<p>As Salvatore Sanfilippo puts it:</p>
<blockquote>
<p>It does not matter if AI companies will not be able to get their money back and the stock market will crash. All that is irrelevant, in the long run. It does not matter if this or the other CEO of some unicorn is telling you something that is off putting, or absurd. Programming changed forever, anyway.</p>
</blockquote>
<p>I do like this hopeful positive outlook on what this could all mean, emphasis mine:</p>
<blockquote>
<p>How do I feel, about all the code I wrote that was ingested by LLMs? I feel great to be part of that, because I see this as a continuation of what I tried to do all my life: democratizing code, systems, knowledge. <strong>LLMs are going to help us to write better software, faster, and will allow small teams to have a chance to compete with bigger companies</strong>. The same thing open source software did in the 90s.</p>
</blockquote>
<p>This post has been the subject of heated discussions all day today on both <a href="https://news.ycombinator.com/item?id=46574276">Hacker News</a> and <a href="https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype">Lobste.rs</a>.
<p>Tags: <a href="https://simonwillison.net/tags/salvatore-sanfilippo">salvatore-sanfilippo</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a></p>
</summary><category term="salvatore-sanfilippo"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-ethics"/></entry><entry><title>My answers to the questions I posed about porting open source code with LLMs</title><link href="https://simonwillison.net/2026/Jan/11/answers/#atom-everything" rel="alternate"/><published>2026-01-11T22:59:23+00:00</published><updated>2026-01-11T22:59:23+00:00</updated><id>https://simonwillison.net/2026/Jan/11/answers/#atom-everything</id><summary type="html">
<p>Last month I <a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/">wrote about porting JustHTML from Python to JavaScript</a> using Codex CLI and GPT-5.2 in a few hours while also buying a Christmas tree and watching Knives Out 3. I ended that post with a series of open questions about the ethics and legality of this style of work. Alexander Petros on <a href="https://lobste.rs/">lobste.rs</a> just <a href="https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype#c_cqkdve">challenged me to answer them</a>, which is fair enough! Here's my attempt at that.</p>
<p>You can read <a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/">the original post</a> for background, but the short version is that it's now possible to point a coding agent at some other open source project and effectively tell it "port this to language X and make sure the tests still pass" and have it do exactly that.</p>
<p>Here are the questions I posed along with my answers based on my current thinking. Extra context is that I've since tried variations on a similar theme a few more times using Claude Code and Opus 4.5 and found it to be <em>astonishingly</em> effective.</p>
<h4 id="does-this-library-represent-a-legal-violation-of-copyright-of-either-the-rust-library-or-the-python-one">Does this library represent a legal violation of copyright of either the Rust library or the Python one?</h4>
<p>I decided that the right thing to do here was to <a href="https://github.com/simonw/justjshtml/commit/a415d0af40c34bf9a856e956d841513f482867e3">keep the open source license and copyright statement</a> from the Python library author and treat what I had built as a derivative work, which is the entire <em>point</em> of open source.</p>
<h4 id="even-if-this-is-legal-is-it-ethical-to-build-a-library-in-this-way">Even if this is legal, is it ethical to build a library in this way?</h4>
<p>After sitting on this for a while I've come down on yes, provided full credit is given and the license is carefully considered. Open source allows and encourages further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used. I don't think this is materially different, although a port to another language entirely does feel like a slightly different shape.</p>
<h4 id="does-this-format-of-development-hurt-the-open-source-ecosystem">Does this format of development hurt the open source ecosystem?</h4>
<p>Now this one is complicated!</p>
<p>It definitely hurts some projects because there are open source maintainers out there who say things like "I'm not going to release any open source code any more because I don't want it used for training" - I expect some of those would be equally angered by LLM-driven derived works as well.</p>
<p>I don't know how serious this problem is - I've seen angry comments from anonymous usernames, but do they represent genuine open source contributions or are they just angry anonymous usernames?</p>
<p>If we assume this is real, does the loss of those individuals get balanced out by the increase in individuals who CAN contribute to open source because they can now get work done in a few hours that might previously have taken them a few days that they didn't have to spare?</p>
<p>I'll be brutally honest about that question: I think that if "they might train on my code / build a derived version with an LLM" is enough to drive you away from open source, your open source values are distinct enough from mine that I'm not ready to invest significantly in keeping you. I'll put that effort into welcoming the newcomers instead.</p>
<p>The much bigger concern for me is the impact of generative AI on <em>demand</em> for open source. The recent <a href="https://simonwillison.net/2026/Jan/7/adam-wathan/">Tailwind story</a> is a visible example of this - while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I'm suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.</p>
<p>I've found myself affected by this for open source dependencies too. The other day I wanted to parse a cron expression in some Go code. Usually I'd go looking for an existing library for cron expression parsing - but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.</p>
<p>I expect that this is going to quite radically impact the shape of the open source library world over the next few years. Is that "harmful to open source"? It may well be. I'm hoping that whatever new shape comes out of this has its own merits, but I don't know what those would be.</p>
<h4 id="can-i-even-assert-copyright-over-this-given-how-much-of-the-work-was-produced-by-the-llm">Can I even assert copyright over this, given how much of the work was produced by the LLM?</h4>
<p>I'm not a lawyer so I don't feel credible to comment on this one. My loose hunch is that I'm still putting enough creative control in through the way I direct the models for that to count as enough human intervention, at least under US law, but I have no idea.</p>
<h4 id="is-it-responsible-to-publish-software-libraries-built-in-this-way">Is it responsible to publish software libraries built in this way?</h4>
<p>I've come down on "yes" here, again because I never thought it was irresponsible for some random university student to slap an Apache license on some bad code they just coughed up on GitHub.</p>
<p>What's important here is making it very clear to potential users what they should expect from that software. I've started publishing my AI-generated and not 100% reviewed libraries as alphas, which I'm tentatively thinking of as <strong>"alpha slop"</strong>. I'll take the alpha label off once I've used them in production to the point that I'm willing to stake my reputation on them being decent implementations, and I'll ship a 1.0 version when I'm confident that they are a solid bet for other people to depend on. I think that's the responsible way to handle this.</p>
<h4 id="how-much-better-would-this-library-be-if-an-expert-team-hand-crafted-it-over-the-course-of-several-months">How much better would this library be if an expert team hand crafted it over the course of several months?</h4>
<p>That one was a deliberately provocative question, because for a new HTML5 parsing library that passes 9,200 tests you would need a <em>very</em> good reason to hire an expert team for two months (at a cost of hundreds of thousands of dollars) to write such a thing. And honestly, thanks to the existing conformance suites this kind of library is simple enough that you may find their results weren't notably better than the one written by the coding agent.</p>
<p>Tags: <a href="https://simonwillison.net/tags/definitions">definitions</a>, <a href="https://simonwillison.net/tags/open-source">open-source</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-assisted-programming">ai-assisted-programming</a>, <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/conformance-suites">conformance-suites</a></p>
</summary><category term="definitions"/><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-ethics"/><category term="conformance-suites"/></entry></feed>
{
"accept-ranges": "bytes",
"access-control-allow-methods": "GET, OPTIONS",
"access-control-allow-origin": "*",
"access-control-max-age": "1000",
"age": "461",
"cache-control": "s-maxage=600",
"cf-cache-status": "HIT",
"cf-ray": "9c50874ac4f5724e-CMH",
"connection": "keep-alive",
"content-length": "174694",
"content-type": "application/xml; charset=utf-8",
"date": "Wed, 28 Jan 2026 12:35:13 GMT",
"django-composition": "Improvisation #3",
"last-modified": "Tue, 27 Jan 2026 16:58:08 GMT",
"nel": "{\"report_to\":\"heroku-nel\",\"response_headers\":[\"Via\"],\"max_age\":3600,\"success_fraction\":0.01,\"failure_fraction\":0.1}",
"referrer-policy": "strict-origin-when-cross-origin",
"report-to": "{\"group\":\"heroku-nel\",\"endpoints\":[{\"url\":\"https://nel.heroku.com/reports?s=MlNzhZlCoh9LibrcBpbAT5W8OIt%2FaRMc6HEq4iSuUes%3D\\u0026sid=c46efe9b-d3d2-4a0c-8c76-bfafa16c5add\\u0026ts=1769603252\"}],\"max_age\":3600}",
"reporting-endpoints": "heroku-nel=\"https://nel.heroku.com/reports?s=MlNzhZlCoh9LibrcBpbAT5W8OIt%2FaRMc6HEq4iSuUes%3D&sid=c46efe9b-d3d2-4a0c-8c76-bfafa16c5add&ts=1769603252\"",
"server": "cloudflare",
"via": "1.1 heroku-router",
"x-content-type-options": "nosniff"
}
{
"meta": {
"type": "atom",
"version": "1.0"
},
"language": "en-us",
"title": "Simon Willison's Weblog",
"description": null,
"copyright": null,
"url": "http://simonwillison.net/",
"self": "http://simonwillison.net/atom/everything/",
"published": null,
"updated": "2026-01-27T16:58:08.000Z",
"generator": null,
"image": null,
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [],
"items": [
{
"id": "https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-everything",
"title": "One Human + One Agent = One Browser From Scratch",
"description": "<p><strong><a href=\"https://emsh.cat/one-human-one-agent-one-browser/\">One Human + One Agent = One Browser From Scratch</a></strong></p>\nembedding-shapes was <a href=\"https://emsh.cat/cursor-implied-success-without-evidence/\">so infuriated</a> by the hype around Cursor's <a href=\"https://simonwillison.net/2026/Jan/23/fastrender/\">FastRender browser project</a> - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.</p>\n<p>The result is <a href=\"https://github.com/embedding-shapes/one-agent-one-browser\">one-agent-one-browser</a> and it's <em>really</em> impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.</p>\n<p>I installed the <a href=\"https://github.com/embedding-shapes/one-agent-one-browser/releases/tag/0.1.0\">1MB macOS binary release</a> and ran it against my blog:</p>\n<pre><code>chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64 \n~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/\n</code></pre>\n<p>Here's the result:</p>\n<p><img alt=\"My blog rendered in a window. Everything is in the right place, the CSS gradients look good, the feed subscribe SVG icon is rendered correctly but there's a missing PNG image.\" src=\"https://static.simonwillison.net/static/2026/one-agent-simonwillison.jpg\" /></p>\n<p>It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).</p>\n<p>The code is pretty readable too - here's <a href=\"https://github.com/embedding-shapes/one-agent-one-browser/blob/0.1.0/src/layout/flex.rs\">the flexbox implementation</a>.</p>\n<p>I had thought that \"build a web browser\" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.</p>\n<p>Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!</p>\n<p>I'm going to upgrade my <a href=\"https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise\">prediction for 2029</a>: I think we're going to get a <em>production-grade</em> web browser built by a small team using AI assistance by then.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46779522\">Show Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/browsers\">browsers</a>, <a href=\"https://simonwillison.net/tags/predictions\">predictions</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/rust\">rust</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/codex-cli\">codex-cli</a>, <a href=\"https://simonwillison.net/tags/browser-challenge\">browser-challenge</a></p>",
"url": "https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-everything",
"published": "2026-01-27T16:58:08.000Z",
"updated": "2026-01-27T16:58:08.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "browsers",
"term": "browsers",
"url": null
},
{
"label": "predictions",
"term": "predictions",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "rust",
"term": "rust",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "codex-cli",
"term": "codex-cli",
"url": null
},
{
"label": "browser-challenge",
"term": "browser-challenge",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-everything",
"title": "Kimi K2.5: Visual Agentic Intelligence",
"description": "<p><strong><a href=\"https://www.kimi.com/blog/kimi-k2-5.html\">Kimi K2.5: Visual Agentic Intelligence</a></strong></p>\nKimi K2 landed <a href=\"https://simonwillison.net/2025/Jul/11/kimi-k2/\">in July</a> as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking <a href=\"https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/\">in November</a> which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:</p>\n<blockquote>\n<p>Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.</p>\n</blockquote>\n<p>The \"self-directed agent swarm paradigm\" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:</p>\n<blockquote>\n<p>For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.</p>\n</blockquote>\n<p>I used the <a href=\"https://openrouter.ai/moonshotai/kimi-k2.5\">OpenRouter Chat UI</a> to have it \"Generate an SVG of a pelican riding a bicycle\", and it did <a href=\"https://gist.github.com/simonw/32a85e337fbc6ee935d10d89726c0476\">quite well</a>:</p>\n<p><img alt=\"Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame.\" src=\"https://static.simonwillison.net/static/2026/kimi-k2.5-pelican.png\" /></p>\n<p>As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:</p>\n<blockquote>\n<p>I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.</p>\n</blockquote>\n<p>Here's <a href=\"https://gist.github.com/simonw/ee2583b2eb5706400a4737f56d57c456\">the full response</a>. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt <a href=\"https://claude.ai/share/df9258e7-97ba-4362-83da-76d31d96196f\">against Claude Opus 4.5</a> and <a href=\"https://chatgpt.com/share/6978d48c-3f20-8006-9c77-81161f899104\">against GPT-5.2 Thinking</a>.</p>\n<p>The <a href=\"https://huggingface.co/moonshotai/Kimi-K2.5\">Hugging Face repository</a> is 595GB. The model uses Kimi's janky \"modified MIT\" license, which adds the following clause:</p>\n<blockquote>\n<p>Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display \"Kimi K2.5\" on the user interface of such product or service.</p>\n</blockquote>\n<p>Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has <a href=\"https://twitter.com/awnihannun/status/1943723599971443134\">been demonstrated to work</a> with previous trillion parameter K2 models.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46775961\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/hugging-face\">hugging-face</a>, <a href=\"https://simonwillison.net/tags/vision-llms\">vision-llms</a>, <a href=\"https://simonwillison.net/tags/llm-tool-use\">llm-tool-use</a>, <a href=\"https://simonwillison.net/tags/ai-agents\">ai-agents</a>, <a href=\"https://simonwillison.net/tags/pelican-riding-a-bicycle\">pelican-riding-a-bicycle</a>, <a href=\"https://simonwillison.net/tags/llm-release\">llm-release</a>, <a href=\"https://simonwillison.net/tags/ai-in-china\">ai-in-china</a>, <a href=\"https://simonwillison.net/tags/moonshot\">moonshot</a>, <a href=\"https://simonwillison.net/tags/parallel-agents\">parallel-agents</a>, <a href=\"https://simonwillison.net/tags/kimi\">kimi</a>, <a href=\"https://simonwillison.net/tags/janky-licenses\">janky-licenses</a></p>",
"url": "https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-everything",
"published": "2026-01-27T15:07:41.000Z",
"updated": "2026-01-27T15:07:41.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "hugging-face",
"term": "hugging-face",
"url": null
},
{
"label": "vision-llms",
"term": "vision-llms",
"url": null
},
{
"label": "llm-tool-use",
"term": "llm-tool-use",
"url": null
},
{
"label": "ai-agents",
"term": "ai-agents",
"url": null
},
{
"label": "pelican-riding-a-bicycle",
"term": "pelican-riding-a-bicycle",
"url": null
},
{
"label": "llm-release",
"term": "llm-release",
"url": null
},
{
"label": "ai-in-china",
"term": "ai-in-china",
"url": null
},
{
"label": "moonshot",
"term": "moonshot",
"url": null
},
{
"label": "parallel-agents",
"term": "parallel-agents",
"url": null
},
{
"label": "kimi",
"term": "kimi",
"url": null
},
{
"label": "janky-licenses",
"term": "janky-licenses",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/26/tests/#atom-everything",
"title": "Tips for getting coding agents to write good Python tests",
"description": "<p>Someone <a href=\"https://news.ycombinator.com/item?id=46765460#46765823\">asked</a> on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:</p>\n<hr />\n<p>I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.</p>\n<p>Or I can say \"use pytest-httpx to mock the endpoints\" and Claude knows what I mean.</p>\n<p>Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.</p>\n<p>\"Refactor those tests to use pytest.mark.parametrize\" and \"extract the common setup into a pytest fixture\" work really well there.</p>\n<p>Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.</p>\n<p>I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.</p>\n<p>One last tip I use a lot is this:</p>\n<pre><code>Clone datasette/datasette-enrichments\nfrom GitHub to /tmp and imitate the\ntesting patterns it uses\n</code></pre>\n<p>I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.</p>\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/testing\">testing</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/python\">python</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/hacker-news\">hacker-news</a>, <a href=\"https://simonwillison.net/tags/pytest\">pytest</a></p>",
"url": "https://simonwillison.net/2026/Jan/26/tests/#atom-everything",
"published": "2026-01-26T23:55:29.000Z",
"updated": "2026-01-26T23:55:29.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "testing",
"term": "testing",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "python",
"term": "python",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "hacker-news",
"term": "hacker-news",
"url": null
},
{
"label": "pytest",
"term": "pytest",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything",
"title": "ChatGPT Containers can now run bash, pip/npm install packages, and download files",
"description": "<p>One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter <a href=\"https://simonwillison.net/2023/Apr/12/code-interpreter/\">nearly three years ago</a>, was half-heartedly rebranded to \"Advanced Data Analysis\" at some point and is generally really difficult to find detailed documentation about. Case in point: it appears to have had a <em>massive</em> upgrade at some point in the past few months, and I can't find documentation about the new capabilities anywhere!</p>\n<p>Here are the most notable new features:</p>\n<ol>\n<li>ChatGPT can <strong>directly run Bash commands</strong> now. Previously it was limited to Python code only, although it could run shell commands via the Python <code>subprocess</code> module.</li>\n<li>\n<strong>It has Node.js</strong> and can run JavaScript directly in addition to Python. I also got it to run \"hello world\" in <strong>Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++</strong>. No Rust yet though!</li>\n<li>While the container still can't make outbound network requests, <strong><code>pip install package</code> and <code>npm install package</code> both work</strong> now via a custom proxy mechanism.</li>\n<li>ChatGPT can locate the URL for a file on the web and use a <code>container.download</code> tool to <strong>download that file and save it to a path</strong> within the sandboxed container.</li>\n</ol>\n<p>This is a substantial upgrade! ChatGPT can now write and then test code in 10 new languages (11 if you count Bash), can find files online and download them into the container, and can install additional packages via <code>pip</code> and <code>npm</code> to help it solve problems.</p>\n<p>(OpenAI <em>really</em> need to develop better habits at <a href=\"https://help.openai.com/en/articles/6825453-chatgpt-release-notes\">keeping their release notes up-to-date</a>!)</p>\n<p>I was initially suspicious that maybe I'd stumbled into a new preview feature that wasn't available to everyone, but I <a href=\"https://chatgpt.com/share/6977aa7c-7bd8-8006-8129-8c9e25126fed\">tried some experiments</a> in a free ChatGPT account and confirmed that the new features are available there as well.</p>\n\n<ul>\n <li><a href=\"https://simonwillison.net/2026/Jan/26/chatgpt-containers/#container-download\">container.download</a></li>\n <li><a href=\"https://simonwillison.net/2026/Jan/26/chatgpt-containers/#is-container-download-a-data-exfiltration-vulnerability-\">Is container.download a data exfiltration vulnerability?</a></li>\n <li><a href=\"https://simonwillison.net/2026/Jan/26/chatgpt-containers/#bash-and-other-languages\">Bash and other languages</a></li>\n <li><a href=\"https://simonwillison.net/2026/Jan/26/chatgpt-containers/#installing-packages-from-pip-and-npm\">Installing packages from pip and npm</a></li>\n <li><a href=\"https://simonwillison.net/2026/Jan/26/chatgpt-containers/#adding-it-all-together\">Adding it all together</a></li>\n</ul>\n\n<h4 id=\"container-download\">container.download</h4>\n<p>My first clue to the new features came the other day when I got curious about Los Angeles air quality, in particular has the growing number of electric vehicles there had a measurable impact?</p>\n<p>I prompted a fresh GPT-5.2 Thinking session with:</p>\n<blockquote>\n<p>Show me Los Angeles air quality over time for last 20 years</p>\n</blockquote>\n<p>Here's the <a href=\"https://chatgpt.com/share/6977a80d-e6e4-8006-b4fe-ed8478fe99d8\">shared transcript</a>. Annoyingly that doesn't make the thinking trace available, so here's a screenshot:</p>\n<p><img src=\"https://static.simonwillison.net/static/2026/container-download.jpg\" alt=\"Exploring methods to fetch AQI data and reports Downloading AQI data for Los Angeles I found an xlsx file titled \"Air Quality Statistics by City, 2000-2023\", which should include data for Los Angeles. While user requested data for Using provided URL to download file The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. Let’s proceed with this approach and see if it works out for downloading the file directly! Analyzed Parsing and analyzing the data for Los Angeles Now that we've downloaded the data, I'll parse it using Python. First, I'll focus on identifying the row(s) corresponding to Los Angeles, which may be labeled under \"Los Angeles-Long Beach-Anaheim, CA.\" Afterward, I'll plot metrics like \"Median AQI\" and \"Days Good\" over time to visualize trends. Let’s begin!\" style=\"max-width: 100%;\" /></p>\n<p>This bit instantly caught my eye:</p>\n<blockquote>\n<p>[...] The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. [...]</p>\n</blockquote>\n<p>What's <code>container.download</code>? I <a href=\"https://chatgpt.com/share/6977aab0-b874-8006-9b54-d3d216ed884b\">asked ChatGPT</a>:</p>\n<blockquote>\n<p>Without running any searches tell me about your container.download tool</p>\n</blockquote>\n<p>And got back:</p>\n<blockquote>\n<p>Sure. <code>container.download</code> is a built-in tool I can call that <strong>fetches a file from a URL and saves it into the sandboxed container filesystem</strong> for this chat session.</p>\n<h4 id=\"what-it-does\">What it does</h4>\n<ul>\n<li>Takes a <strong>publicly reachable URL</strong> and a <strong>destination filepath</strong> in the container.</li>\n<li>Downloads the bytes from that URL and writes them to the given path.</li>\n<li>After that, I can read/process the file locally in the container (e.g., unzip it, parse it with Python, open it as an image, convert it, etc.).</li>\n</ul>\n<h4 id=\"signature-inputs-\">Signature (inputs)</h4>\n<p>It’s called like this (conceptually):</p>\n<div class=\"highlight highlight-source-json\"><pre>{\n <span class=\"pl-ent\">\"url\"</span>: <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>https://example.com/somefile.zip<span class=\"pl-pds\">\"</span></span>,\n <span class=\"pl-ent\">\"filepath\"</span>: <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>/mnt/data/somefile.zip<span class=\"pl-pds\">\"</span></span>\n}</pre></div>\n</blockquote>\n<p>Further experimentation confirmed that <code>container.download</code> behaves as described.</p>\n<p>I had it download one file from a server with logs enabled and captured the IP address and user-agent. Here are the most interesting headers:</p>\n<pre><code>User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot\nAccept: text/html, application/xhtml+xml, application/xml;q=0.9, image/avif, image/webp, image/apng, */*;q=0.8, application/signed-exchange;v=b3;q=0.9\nCf-Connecting-Ip: 52.230.164.178\n</code></pre>\n<p>That <code>52.230.164.178</code> IP address <a href=\"http://ip-api.com/json/52.230.164.178\">resolves to</a> Microsoft Azure Cloud (centralus) in Des Moines, Iowa.</p>\n<h4 id=\"is-container-download-a-data-exfiltration-vulnerability-\">Is container.download a data exfiltration vulnerability?</h4>\n<p>On the one hand, this is really useful! ChatGPT can navigate around websites looking for useful files, download those files to a container and then process them using Python or other languages.</p>\n<p>Is this a data exfiltration vulnerability though? Could a prompt injection attack trick ChatGPT into leaking private data out to a <code>container.download</code> call to a URL with a query string that includes sensitive information?</p>\n<p>I don't think it can. I tried getting it to assemble a URL with a query string and access it using <code>container.download</code> and it couldn't do it. It told me that it got back this error:</p>\n<blockquote>\n<p>ERROR: download failed because url not viewed in conversation before. open the file or url using web.run first.</p>\n</blockquote>\n<p>This looks to me like the same safety trick <a href=\"https://simonwillison.net/2025/Sep/10/claude-web-fetch-tool/\">used by Claude's Web Fetch tool</a>: only allow URL access if that URL was either directly entered by the user or if it came from search results that could not have been influenced by a prompt injection.</p>\n<p>(I poked at this a bit more and managed to get a simple constructed query string to pass through <code>web.run</code> - a different tool entirely - but when I tried to compose a longer query string containing the previous prompt history a <code>web.run</code> filter blocked it.)</p>\n<p>So I <em>think</em> this is all safe, though I'm curious if it could hold firm against a more aggressive round of attacks from a seasoned security researcher.</p>\n<h4 id=\"bash-and-other-languages\">Bash and other languages</h4>\n<p>The key lesson from coding agents like Claude Code and Codex CLI is that Bash rules everything: if an agent can run Bash commands in an environment it can do almost anything that can be achieved by typing commands into a computer.</p>\n<p>When Anthropic added their own code interpreter feature to Claude <a href=\"https://simonwillison.net/2025/Sep/9/claude-code-interpreter/\">last September</a> they built that around Bash rather than just Python. It looks to me like OpenAI have now done the same thing for ChatGPT.</p>\n<p>Here's what ChatGPT looks like when it runs a Bash command - here my prompt was:</p>\n<blockquote>\n<p>npm install a fun package and demonstrate using it</p>\n</blockquote>\n<p><img src=\"https://static.simonwillison.net/static/2026/chatgpt-bash.jpg\" alt=\"Screenshot of ChatGPT 5.2 Thinking interface with Activity panel. Main chat shows user prompt \"npm install a fun package and demonstrate using it\" and response \"Thought for 32s > Yep — here's a fun one: cowsay 🐄 I installed it with npm and ran a tiny Node script:\" followed by bash code block containing \"mkdir -p /mnt/data/npmfun cd /mnt/data/npmfun npm init -y npm install cowsay --no-fund --no-audit node -e \"const cowsay=require('cowsay'); console.log(cowsay.say({text:'Hel\" and Output: section showing ASCII cow art saying \"< Hello from npm inside the container! >\" with cow drawing made of characters. Activity panel on right (32s) shows: \"Running npm install in container\" with bash command and output \"v22.16.0 10.9.2\", \"Inspecting npm command output and container status\" with directory listing showing \"drwxr-xr-x 2 root oai_shared 80 Jan 26 17:32 . drwxrwxr-x 2 oai oai_shared 60 Jan 26 17:32 .. -rw-r--r-- 1 root root\", \"Re-running npm install\" note about node_modules missing, and \"Rerunning npm install with output\" showing \"added 41 packages in 2s\".\" style=\"max-width: 100%;\" /></p>\n<p>It's useful to click on the \"Thinking\" or \"Thought for 32s\" links as that opens the Activity sidebar with a detailed trace of what ChatGPT did to arrive at its answer. This helps guard against cheating - ChatGPT might claim to have run Bash in the main window but it can't fake those black and white logs in the Activity panel.</p>\n<p>I had it run <a href=\"https://chatgpt.com/share/6977b890-f1c0-8006-b39a-9a02c7ca1993\">Hello World in various languages</a> later in that same session.</p>\n<p><img src=\"https://static.simonwillison.net/static/2026/chatgpt-hello-worlds.jpg\" alt=\"Terminal screenshot showing \"Hello, world\" programs in multiple languages. Top section shows code: swift -e 'print(\"Hello, world from Swift!\")' followed by cat > hello.kts <<'KTS' / println(\"Hello, world from Kotlin!\") / KTS / kotlinc -script hello.kts, then cat > hello.c <<'C' / #include <stdio.h> / int main(){ puts(\"Hello, world from C!\"); return 0; } / C / gcc hello.c -o hello_c / ./hello_c, then cat > hello.cpp <<'CPP' / #include <iostream> / int main(){ std::cout << \"Hello, world from C++!\" << std::endl; } / CPP / g++ hello.cpp -o hello_cpp / ./hello_cpp. Bottom section shows output: Hello, world from Ruby! / Hello, world from Perl! / Hello, world from PHP! / Hello, world from Go! / Hello, world from Java! / Hello, world from Swift! / Hello, world from Kotlin! / Hello, world from C! / Hello, world from C++!. UI shows \"Thought for 2m 29s\" and \"Done\" at bottom.\" style=\"max-width: 100%;\" /></p>\n<h4 id=\"installing-packages-from-pip-and-npm\">Installing packages from pip and npm</h4>\n<p>In the previous example ChatGPT installed the <code>cowsay</code> package from npm and used it to draw an ASCII-art cow. But how could it do that if the container can't make outbound network requests?</p>\n<p>In <a href=\"https://chatgpt.com/share/69773501-b6d8-8006-bbf2-fa644561aa26\">another session</a> I challenged it to explore its environment. and figure out how that worked.</p>\n<p>Here's <a href=\"https://github.com/simonw/research/blob/main/chatgpt-container-environment/README.md\">the resulting Markdown report</a> it created.</p>\n<p>The key magic appears to be a <code>applied-caas-gateway1.internal.api.openai.org</code> proxy, available within the container and with various packaging tools configured to use it.</p>\n<p>The following environment variables cause <code>pip</code> and <code>uv</code> to install packages from that proxy instead of directly from PyPI:</p>\n<pre><code>PIP_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple\nPIP_TRUSTED_HOST=packages.applied-caas-gateway1.internal.api.openai.org\nUV_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple\nUV_INSECURE_HOST=https://packages.applied-caas-gateway1.internal.api.openai.org\n</code></pre>\n<p>This one appears to get <code>npm</code> to work:</p>\n<pre><code>NPM_CONFIG_REGISTRY=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../npm-public\n</code></pre>\n<p>And it reported these suspicious looking variables as well:</p>\n<pre><code>CAAS_ARTIFACTORY_BASE_URL=packages.applied-caas-gateway1.internal.api.openai.org\nCAAS_ARTIFACTORY_PYPI_REGISTRY=.../artifactory/api/pypi/pypi-public\nCAAS_ARTIFACTORY_NPM_REGISTRY=.../artifactory/api/npm/npm-public\nCAAS_ARTIFACTORY_GO_REGISTRY=.../artifactory/api/go/golang-main\nCAAS_ARTIFACTORY_MAVEN_REGISTRY=.../artifactory/maven-public\nCAAS_ARTIFACTORY_GRADLE_REGISTRY=.../artifactory/gradle-public\nCAAS_ARTIFACTORY_CARGO_REGISTRY=.../artifactory/api/cargo/cargo-public/index\nCAAS_ARTIFACTORY_DOCKER_REGISTRY=.../dockerhub-public\nCAAS_ARTIFACTORY_READER_USERNAME=reader\nCAAS_ARTIFACTORY_READER_PASSWORD=****\nNETWORK=caas_packages_only\n</code></pre>\n<p>Neither Rust nor Docker are installed in the container environment, but maybe those registry references are a clue of features still to come.</p>\n<h4 id=\"adding-it-all-together\">Adding it all together</h4>\n<p>The result of all of this? You can tell ChatGPT to use Python or Node.js packages as part of a conversation and it will be able to install them and apply them against files you upload or that it downloads from the public web. That's <em>really</em> cool.</p>\n<p>The big missing feature here should be the easiest to provide: we need <strong>official documentation</strong>! A release notes entry would be a good start, but there are a lot of subtle details to how this new stuff works, its limitations and what it can be used for.</p>\n<p>As always, I'd also encourage OpenAI to come up with a name for this set of features that properly represents how it works and what it can do.</p>\n<p>In the meantime, I'm going to call this <strong>ChatGPT Containers</strong>.</p>\n<h4 id=\"list-of-tools\">Update: a full list of tools</h4>\n<p>I decided to ask ChatGPT about other tools that were available to it in case there was anything interesting in there:</p>\n<blockquote><p>List all tools that are available to you, with their exact names and descriptions and signatures</p></blockquote>\n<p>Here's <a href=\"https://chatgpt.com/share/6977ffa0-df14-8006-9647-2b8c90ccbb81\">what I got back</a>.</p>\n\n<details><summary>List of ChatGPT tools according to GPT-5.2 Thinking on 26th January 2026</summary>\n<p style=\"margin-top: 1em\">Below are <strong>all tools available to me in this chat</strong>, with their <strong>exact names</strong>, <strong>descriptions</strong>, and <strong>signatures</strong> (as defined in my tool interface).</p>\n<hr />\n<h4 id=\"python\">python</h4>\n<h5 id=\"python-exec-freeform-any\">\n<code>python.exec</code> (FREEFORM) → any</h5>\n<p><strong>Description:</strong> Execute Python code in a Jupyter environment for <em>private/internal reasoning</em> (not shown to you). Internet access is disabled.<br />\n<strong>Signature:</strong> <code>python.exec(FREEFORM) -> any</code></p>\n<hr />\n<h4 id=\"web\">web</h4>\n<h5 id=\"web-run-toolcallv5-json-any\">\n<code>web.run</code> (ToolCallV5 JSON) → any</h5>\n<p><strong>Description:</strong> Access the internet and specialized lookups (search, open/click/find pages, screenshot PDFs, image search, product search, finance, weather, sports, calculator, time). Supports citations via returned <code>turn...</code> reference IDs.<br />\n<strong>Signature:</strong><br />\n<code>web.run({ open?, click?, find?, screenshot?, image_query?, product_query?, sports?, finance?, weather?, calculator?, time?, search_query?, response_length? }) -> any</code></p>\n<hr />\n<h4 id=\"automations\">automations</h4>\n<h5 id=\"automations-create-any\">\n<code>automations.create</code> → any</h5>\n<p><strong>Description:</strong> Schedule a future/recurring task/reminder/search via iCal VEVENT.<br />\n<strong>Signature:</strong><br />\n<code>automations.create({ prompt: string, title: string, schedule?: string, dtstart_offset_json?: string }) -> any</code></p>\n<h5 id=\"automations-update-any\">\n<code>automations.update</code> → any</h5>\n<p><strong>Description:</strong> Modify an existing scheduled task.<br />\n<strong>Signature:</strong><br />\n<code>automations.update({ jawbone_id: string, schedule?: string, dtstart_offset_json?: string, prompt?: string, title?: string, is_enabled?: boolean }) -> any</code></p>\n<h5 id=\"automations-list-any\">\n<code>automations.list</code> → any</h5>\n<p><strong>Description:</strong> List existing automations.<br />\n<strong>Signature:</strong> <code>automations.list() -> any</code></p>\n<hr />\n<h4 id=\"guardian-tool\">guardian_tool</h4>\n<h5 id=\"guardian-tool-get-policy-str\">\n<code>guardian_tool.get_policy</code> → str</h5>\n<p><strong>Description:</strong> Retrieve policy guidance for specific categories (currently: <code>election_voting</code>).<br />\n<strong>Signature:</strong> <code>guardian_tool.get_policy({ category: string }) -> str</code></p>\n<hr />\n<h4 id=\"file-search\">file_search</h4>\n<h5 id=\"file-search-msearch-any\">\n<code>file_search.msearch</code> → any</h5>\n<p><strong>Description:</strong> Search across available internal/connected sources (in this environment: primarily “recording_knowledge” transcripts when relevant).<br />\n<strong>Signature:</strong><br />\n<code>file_search.msearch({ queries?: string[], source_filter?: string[], file_type_filter?: string[], intent?: string, time_frame_filter?: { start_date?: string, end_date?: string } }) -> any</code></p>\n<h5 id=\"file-search-mclick-any\">\n<code>file_search.mclick</code> → any</h5>\n<p><strong>Description:</strong> Expand/open previously returned search results (or connector URLs) for more context.<br />\n<strong>Signature:</strong><br />\n<code>file_search.mclick({ pointers?: string[], start_date?: string, end_date?: string }) -> any</code></p>\n<hr />\n<h4 id=\"gmail-read-only-\">gmail (read-only)</h4>\n<h5 id=\"gmail-search-email-ids-any\">\n<code>gmail.search_email_ids</code> → any</h5>\n<p><strong>Description:</strong> Search Gmail message IDs by query/tags (read-only).<br />\n<strong>Signature:</strong><br />\n<code>gmail.search_email_ids({ query?: string, tags?: string[], max_results?: integer, next_page_token?: string }) -> any</code></p>\n<h5 id=\"gmail-batch-read-email-any\">\n<code>gmail.batch_read_email</code> → any</h5>\n<p><strong>Description:</strong> Read a batch of Gmail messages by message IDs (read-only).<br />\n<strong>Signature:</strong><br />\n<code>gmail.batch_read_email({ message_ids: string[] }) -> any</code></p>\n<hr />\n<h4 id=\"gcal-read-only-\">gcal (read-only)</h4>\n<h5 id=\"gcal-search-events-any\">\n<code>gcal.search_events</code> → any</h5>\n<p><strong>Description:</strong> Search Google Calendar events in a time range and/or by keyword (read-only).<br />\n<strong>Signature:</strong><br />\n<code>gcal.search_events({ time_min?: string, time_max?: string, timezone_str?: string, max_results?: integer, query?: string, calendar_id?: string, next_page_token?: string }) -> any</code></p>\n<h5 id=\"gcal-read-event-any\">\n<code>gcal.read_event</code> → any</h5>\n<p><strong>Description:</strong> Read a specific calendar event by ID (read-only).<br />\n<strong>Signature:</strong><br />\n<code>gcal.read_event({ event_id: string, calendar_id?: string }) -> any</code></p>\n<hr />\n<h4 id=\"gcontacts-read-only-\">gcontacts (read-only)</h4>\n<h5 id=\"gcontacts-search-contacts-any\">\n<code>gcontacts.search_contacts</code> → any</h5>\n<p><strong>Description:</strong> Search Google Contacts by keyword (read-only).<br />\n<strong>Signature:</strong><br />\n<code>gcontacts.search_contacts({ query: string, max_results?: integer }) -> any</code></p>\n<hr />\n<h4 id=\"canmore-canvas-\">canmore (canvas)</h4>\n<h5 id=\"canmore-create-textdoc-any\">\n<code>canmore.create_textdoc</code> → any</h5>\n<p><strong>Description:</strong> Create a new canvas text document (markdown or code).<br />\n<strong>Signature:</strong><br />\n<code>canmore.create_textdoc({ name: string, type: \"document\" | \"code/bash\" | \"code/zsh\" | \"code/javascript\" | \"code/typescript\" | \"code/html\" | \"code/css\" | \"code/python\" | \"code/json\" | \"code/sql\" | \"code/go\" | \"code/yaml\" | \"code/java\" | \"code/rust\" | \"code/cpp\" | \"code/swift\" | \"code/php\" | \"code/xml\" | \"code/ruby\" | \"code/haskell\" | \"code/kotlin\" | \"code/csharp\" | \"code/c\" | \"code/objectivec\" | \"code/r\" | \"code/lua\" | \"code/dart\" | \"code/scala\" | \"code/perl\" | \"code/commonlisp\" | \"code/clojure\" | \"code/ocaml\" | \"code/powershell\" | \"code/verilog\" | \"code/dockerfile\" | \"code/vue\" | \"code/react\" | \"code/other\", content: string }) -> any</code></p>\n<h5 id=\"canmore-update-textdoc-any\">\n<code>canmore.update_textdoc</code> → any</h5>\n<p><strong>Description:</strong> Apply regex-based edits to the current canvas textdoc.<br />\n<strong>Signature:</strong><br />\n<code>canmore.update_textdoc({ updates: { pattern: string, multiple?: boolean, replacement: string }[] }) -> any</code></p>\n<h5 id=\"canmore-comment-textdoc-any\">\n<code>canmore.comment_textdoc</code> → any</h5>\n<p><strong>Description:</strong> Add actionable comments on matched regions of the current canvas textdoc.<br />\n<strong>Signature:</strong><br />\n<code>canmore.comment_textdoc({ comments: { pattern: string, comment: string }[] }) -> any</code></p>\n<hr />\n<h4 id=\"python-user-visible\">python_user_visible</h4>\n<h5 id=\"python-user-visible-exec-freeform-any\">\n<code>python_user_visible.exec</code> (FREEFORM) → any</h5>\n<p><strong>Description:</strong> Execute Python code <em>that you will see</em> (tables, plots, generated files). Internet disabled.<br />\n<strong>Signature:</strong> <code>python_user_visible.exec(FREEFORM) -> any</code></p>\n<hr />\n<h4 id=\"user-info\">user_info</h4>\n<h5 id=\"user-info-get-user-info-any\">\n<code>user_info.get_user_info</code> → any</h5>\n<p><strong>Description:</strong> Get your current location and local time (or UTC if unknown).<br />\n<strong>Signature:</strong> <code>user_info.get_user_info({}) -> any</code></p>\n<hr />\n<h4 id=\"summary-reader\">summary_reader</h4>\n<h5 id=\"summary-reader-read-any\">\n<code>summary_reader.read</code> → any</h5>\n<p><strong>Description:</strong> Read safe-to-share internal reasoning summaries from earlier turns (used when asked how I arrived at something).<br />\n<strong>Signature:</strong> <code>summary_reader.read({ limit?: integer, offset?: integer }) -> any</code></p>\n<hr />\n<h4 id=\"container\">container</h4>\n<h5 id=\"container-exec-any\">\n<code>container.exec</code> → any</h5>\n<p><strong>Description:</strong> Run a command inside the container and return stdout/stderr.<br />\n<strong>Signature:</strong><br />\n<code>container.exec({ cmd: string[], session_name?: string | null, workdir?: string | null, timeout?: integer | null, env?: object | null, user?: string | null }) -> any</code></p>\n<h5 id=\"container-feed-chars-any\">\n<code>container.feed_chars</code> → any</h5>\n<p><strong>Description:</strong> Send input to a running interactive exec session and flush output.<br />\n<strong>Signature:</strong><br />\n<code>container.feed_chars({ session_name: string, chars: string, yield_time_ms?: integer }) -> any</code></p>\n<h5 id=\"container-open-image-any\">\n<code>container.open_image</code> → any</h5>\n<p><strong>Description:</strong> Display an image from an absolute path in the container (jpg/jpeg/png/webp).<br />\n<strong>Signature:</strong><br />\n<code>container.open_image({ path: string, user?: string | null }) -> any</code></p>\n<h5 id=\"container-download-any\">\n<code>container.download</code> → any</h5>\n<p><strong>Description:</strong> Download a file from a URL into the container filesystem.<br />\n<strong>Signature:</strong><br />\n<code>container.download({ url: string, filepath: string }) -> any</code></p>\n<hr />\n<h4 id=\"bio-memory-\">bio (memory)</h4>\n<h5 id=\"bio-update-freeform-any\">\n<code>bio.update</code> (FREEFORM) → any</h5>\n<p><strong>Description:</strong> Persist (or forget) user-specific info across conversations (“memory”).<br />\n<strong>Signature:</strong> <code>bio.update(FREEFORM) -> any</code></p>\n<hr />\n<h4 id=\"api-tool\">api_tool</h4>\n<h5 id=\"api-tool-list-resources-any\">\n<code>api_tool.list_resources</code> → any</h5>\n<p><strong>Description:</strong> List available “resources” (including invokable tools) in an OP-style file tree.<br />\n<strong>Signature:</strong><br />\n<code>api_tool.list_resources({ path?: string, cursor?: string | null, only_tools?: boolean, refetch_tools?: boolean }) -> any</code></p>\n<h5 id=\"api-tool-call-tool-any\">\n<code>api_tool.call_tool</code> → any</h5>\n<p><strong>Description:</strong> Invoke a resource as a tool (must know its full path/URI from <code>list_resources</code>).<br />\n<strong>Signature:</strong><br />\n<code>api_tool.call_tool({ path: string, args: object }) -> any</code></p>\n<hr />\n<h4 id=\"personal-context\">personal_context</h4>\n<h5 id=\"personal-context-search-any\">\n<code>personal_context.search</code> → any</h5>\n<p><strong>Description:</strong> Retrieve user-specific context from prior interactions/linked sources to maintain continuity.<br />\n<strong>Signature:</strong><br />\n<code>personal_context.search({ query: string }) -> any</code></p>\n<hr />\n<h4 id=\"image-gen\">image_gen</h4>\n<h5 id=\"image-gen-text2im-any\">\n<code>image_gen.text2im</code> → any</h5>\n<p><strong>Description:</strong> Generate or edit images from text instructions (and optionally referenced image IDs).<br />\n<strong>Signature:</strong><br />\n<code>image_gen.text2im({ prompt?: string | null, size?: string | null, n?: integer | null, transparent_background?: boolean | null, is_style_transfer?: boolean | null, referenced_image_ids?: string[] | null }) -> any</code></p>\n<hr />\n<h4 id=\"user-settings\">user_settings</h4>\n<h5 id=\"user-settings-get-user-settings-any\">\n<code>user_settings.get_user_settings</code> → any</h5>\n<p><strong>Description:</strong> Read current settings and allowed values (personality, accent color, appearance).<br />\n<strong>Signature:</strong> <code>user_settings.get_user_settings() -> any</code></p>\n<h5 id=\"user-settings-set-setting-any\">\n<code>user_settings.set_setting</code> → any</h5>\n<p><strong>Description:</strong> Change one setting (accent color, appearance, or personality) to an allowed value.<br />\n<strong>Signature:</strong><br />\n<code>user_settings.set_setting({ setting_name: \"accent_color\" | \"appearance\" | \"personality\", setting_value: string }) -> any</code></p>\n<hr />\n<h4 id=\"artifact-handoff\">artifact_handoff</h4>\n<h5 id=\"artifact-handoff-prepare-artifact-generation-any\">\n<code>artifact_handoff.prepare_artifact_generation</code> → any</h5>\n<p><strong>Description:</strong> Must be called immediately when the user asks for a spreadsheet or slide deck artifact.<br />\n<strong>Signature:</strong> <code>artifact_handoff.prepare_artifact_generation() -> any</code></p>\n</details>\n \n <p>Tags: <a href=\"https://simonwillison.net/tags/pypi\">pypi</a>, <a href=\"https://simonwillison.net/tags/sandboxing\">sandboxing</a>, <a href=\"https://simonwillison.net/tags/npm\">npm</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/openai\">openai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/chatgpt\">chatgpt</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/code-interpreter\">code-interpreter</a></p>",
"url": "https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything",
"published": "2026-01-26T19:19:31.000Z",
"updated": "2026-01-26T19:19:31.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "pypi",
"term": "pypi",
"url": null
},
{
"label": "sandboxing",
"term": "sandboxing",
"url": null
},
{
"label": "npm",
"term": "npm",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "openai",
"term": "openai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "chatgpt",
"term": "chatgpt",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "code-interpreter",
"term": "code-interpreter",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/#atom-everything",
"title": "the browser is the sandbox",
"description": "<p><strong><a href=\"https://aifoc.us/the-browser-is-the-sandbox/\">the browser is the sandbox</a></strong></p>\nPaul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to operate in and put together these detailed notes on how the web browser can help:</p>\n<blockquote>\n<p>This got me thinking about the browser. Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web, the instant a user taps a URL. [...]</p>\n<p>Could you build something like Cowork in the browser? Maybe. To find out, I built a demo called <a href=\"http://co-do.xyz\">Co-do</a> that tests this hypothesis. In this post I want to discuss the research I've done to see how far we can get, and determine if the browser's ability to run untrusted code is useful (and good enough) for enabling software to do more for us directly on our computer.</p>\n</blockquote>\n<p>Paul then describes how the three key aspects of a sandbox - filesystem, network access and safe code execution - can be handled by browser technologies: the <a href=\"https://developer.chrome.com/docs/capabilities/web-apis/file-system-access\">File System Access API</a> (still Chrome-only as far as I can tell), CSP headers with <code><iframe sandbox></code> and WebAssembly in Web Workers.</p>\n<p>Co-do is a very interesting demo that illustrates all of these ideas in a single application:</p>\n<p><img alt=\"Screenshot of Co-do application interface with robot logo. Left sidebar shows WORKSPACE section with \"Select Folder\" button, \"Selected folder: blog-drafts (Live updates enabled)\" in green text, and FILES list including gtr-t5-large.md, chatgpt.md, llm-digest-october-2..., weeknotes-2024-04-23..., lmarena-april-2025.md, weeknotes-chatgpt-fo..., tests-not-optional-c..., weeknotes-13-mar-202..., issue-for-notes.md, workshop-git-scrapin.... Main chat area shows user message \"Three most recently edited files?\" in orange bubble, \"Using 26 tools...\" dropdown, and AI response: \"Now I need to get the metadata for these files to find the most recently edited ones. Since there are many files, let me get metadata for a sample of them to find the recent ones:Based on the metadata I've gathered, the three most recently edited files are:\" followed by table with columns File and Last Modified showing predictions-2026.md (Jan 8, 2026), december-2025.md (Jan 2, 2026), gemini-3-flash.md (Dec 17, 2025). Green \"Response complete\" banner at bottom with input field placeholder \"Ask me to help with your files...\" and notice \"File contents are sent to your selected AI provider\".\" src=\"https://static.simonwillison.net/static/2026/codo.jpg\" /></p>\n<p>You select a folder full of files and configure an LLM provider and set an API key, Co-do then uses CSP-approved API calls to interact with that provider and provides a chat interface with tools for interacting with those files. It does indeed feel similar to <a href=\"https://simonwillison.net/2026/Jan/12/claude-cowork/\">Claude Cowork</a> but without running a multi-GB local container to provide the sandbox.</p>\n<p>My biggest complaint about <code><iframe sandbox></code> remains how thinly documented it is, especially across different browsers. Paul's post has all sorts of useful details on that which I've not encountered elsewhere, including a complex <a href=\"https://aifoc.us/the-browser-is-the-sandbox/#the-double-iframe-technique\">double-iframe technique</a> to help apply network rules to the inner of the two frames.</p>\n<p>Thanks to this post I also learned about the <code><input type=\"file\" webkitdirectory></code> tag which turns out to work on Firefox, Safari <em>and</em> Chrome and allows a browser read-only access to a full directory of files at once. I had Claude knock up a <a href=\"https://tools.simonwillison.net/webkitdirectory\">webkitdirectory demo</a> to try it out and I'll certainly be using it for projects in the future.</p>\n<p><img alt=\"Screenshot of a dark-themed file explorer application. Top stats bar shows: 12179 TOTAL FILES, 2079 FOLDERS, 244 MB TOTAL SIZE, 97 FILE TYPES in cyan text. Search bar with placeholder \"Search files...\" and \"All types\" dropdown. Left panel labeled \"File tree\" (showing 12179) displays folder hierarchy: datasette > .claude > skills > building-datasette-plugins containing HOOKS.md (10.7 KB, selected/highlighted), INTERNALS.md (10.1 KB), SKILL.md (3.7 KB), TESTING.md (8.4 KB), settings.local.json (280 B); also shows .eggs folder with pytest_runner-6.0.1-py3.9.egg. Right panel \"File preview\" shows selected file details: Name: HOOKS.md, Path: datasette/.claude/skills/building-datasette-plugins/HOOKS.md, Size: 10.7 KB, Type: text/markdown, Last modified: 12/20/2025, 9:28:59 AM. Preview content shows: \"# Plugin Hooks Reference\" followed by \"All hooks use the @hookimpl decorator. Accept only the parameters you need.\" then \"## Database Connection Hooks\" and \"### prepare_connection(conn, database, datasette)\" with description \"Called when a new SQLite connection is created. Use to register custom SQL functions.\" Bottom section \"File type distribution\" shows horizontal bar chart: .py (4439), .no ext (3358), .dat (1068), .pyc (925), .txt (332), .mo (321), .po (321), .html (249).\" src=\"https://static.simonwillison.net/static/2026/webkit-file-tree.jpg\" />\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/browsers\">browsers</a>, <a href=\"https://simonwillison.net/tags/javascript\">javascript</a>, <a href=\"https://simonwillison.net/tags/sandboxing\">sandboxing</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-agents\">ai-agents</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a></p>",
"url": "https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/#atom-everything",
"published": "2026-01-25T23:51:32.000Z",
"updated": "2026-01-25T23:51:32.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "browsers",
"term": "browsers",
"url": null
},
{
"label": "javascript",
"term": "javascript",
"url": null
},
{
"label": "sandboxing",
"term": "sandboxing",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-agents",
"term": "ai-agents",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/25/kakapo-cam/#atom-everything",
"title": "Kākāpō Cam: Rakiura live stream",
"description": "<p><strong><a href=\"https://www.doc.govt.nz/our-work/kakapo-recovery/what-we-do/kakapo-cam-rakiura-live-stream/\">Kākāpō Cam: Rakiura live stream</a></strong></p>\nCritical update for this year's <a href=\"https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-k-k-p-parrots-will-have-an-outstanding-breeding-season\">Kākāpō breeding season</a>: the New Zealand Department of Conservation have a livestream running of Rakiura's nest!</p>\n<blockquote>\n<p>You’re looking at the underground nest of 23-year-old Rakiura. She has chosen this same site to nest for all seven breeding seasons since 2008, a large cavity under a rātā tree. Because she returns to the site so reliably, we’ve been able to make modifications over the years to keep it safe and dry, including adding a well-placed hatch for monitoring eggs and chicks.</p>\n</blockquote>\n<p>Rakiura is a legendary Kākāpō:</p>\n<blockquote>\n<p>Rakiura hatched on 19 February 2002 on Whenua Hou/Codfish Island. She is the offspring of Flossie and Bill. Her name comes from the te reo Māori name for Stewart Island, the place where most of the founding kākāpō population originated.</p>\n<p>Rakiura has nine living descendants, three females and six males, across six breeding seasons. In 2008 came Tōitiiti, in 2009 Tamahou and Te Atapō, in 2011 Tia and Tūtoko, in 2014 Taeatanga and Te Awa, in 2019 Mati-mā and Tautahi. She also has many grandchicks.</p>\n</blockquote>\n<p>She laid her first egg of the season at 4:30pm NZ time on 22nd January. The livestream went live shortly afterwards, once she committed to this nest.</p>\n<p>The stream is <a href=\"https://www.youtube.com/watch?v=BfGL7A2YgUY\">on YouTube</a>. I <a href=\"https://gisthost.github.io/?dc78322de89a2191c593215f109c65d7/index.html\">used Claude Code</a> to write <a href=\"https://tools.simonwillison.net/python/#livestream-gifpy\">a livestream-gif.py script</a> and used that to capture this sped-up video of the last few hours of footage, within which you can catch a glimpse of the egg!</p>\n<video autoplay muted loop controls playsinline style=\"width: 100%;\">\n <source src=\"https://static.simonwillison.net/static/2026/kakapo-timelapse.mp4\" type=\"video/mp4\">\n</video>\n\n <p><small></small>Via <a href=\"https://www.metafilter.com/211927/The-only-parrot-to-have-a-polygynous-lek-breeding-system-sits-on-an-egg\">MetaFilter</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/youtube\">youtube</a>, <a href=\"https://simonwillison.net/tags/kakapo\">kakapo</a>, <a href=\"https://simonwillison.net/tags/conservation\">conservation</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a></p>",
"url": "https://simonwillison.net/2026/Jan/25/kakapo-cam/#atom-everything",
"published": "2026-01-25T04:53:01.000Z",
"updated": "2026-01-25T04:53:01.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "youtube",
"term": "youtube",
"url": null
},
{
"label": "kakapo",
"term": "kakapo",
"url": null
},
{
"label": "conservation",
"term": "conservation",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/24/dont-trust-the-process/#atom-everything",
"title": "Don't \"Trust the Process\"",
"description": "<p><strong><a href=\"https://www.youtube.com/watch?v=4u94juYwLLM\">Don't \"Trust the Process\"</a></strong></p>\nJenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September.</p>\n<p><img alt=\"Don't \"Trust the process\" slide, speaker shown on the left\" src=\"https://static.simonwillison.net/static/2026/dont-trust-process.jpg\" /></p>\n<p>Jenny argues that the Design Process - user research leading to personas leading to user journeys leading to wireframes... all before anything gets built - may be outdated for today's world.</p>\n<blockquote>\n<p><strong>Hypothesis</strong>: In a world where anyone can make anything — what matters is your ability to choose and curate what you make.</p>\n</blockquote>\n<p>In place of the Process, designers should lean into prototypes. AI makes these much more accessible and less time-consuming than they used to be.</p>\n<p>Watching this talk made me think about how AI-assisted programming significantly reduces the cost of building the <em>wrong</em> thing. Previously if the design wasn't right you could waste months of development time building in the wrong direction, which was a very expensive mistake. If a wrong direction wastes just a few days instead we can take more risks and be much more proactive in exploring the problem space.</p>\n<p>I've always been a compulsive prototyper though, so this is very much playing into my own existing biases!\n\n <p><small></small>Via <a href=\"https://twitter.com/jenny_wen/status/2014479445738893649\">@jenny_wen</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/design\">design</a>, <a href=\"https://simonwillison.net/tags/prototyping\">prototyping</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/vibe-coding\">vibe-coding</a></p>",
"url": "https://simonwillison.net/2026/Jan/24/dont-trust-the-process/#atom-everything",
"published": "2026-01-24T23:31:03.000Z",
"updated": "2026-01-24T23:31:03.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "design",
"term": "design",
"url": null
},
{
"label": "prototyping",
"term": "prototyping",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "vibe-coding",
"term": "vibe-coding",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/24/jasmine-sun/#atom-everything",
"title": "Quoting Jasmine Sun",
"description": "<blockquote cite=\"https://jasmi.news/p/claude-code\"><p><strong>If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.”</strong> Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are. [...]</p>\n<p>Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. <em>Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg</em>, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.</p></blockquote>\n<p class=\"cite\">— <a href=\"https://jasmi.news/p/claude-code\">Jasmine Sun</a></p>\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/vibe-coding\">vibe-coding</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a></p>",
"url": "https://simonwillison.net/2026/Jan/24/jasmine-sun/#atom-everything",
"published": "2026-01-24T21:34:35.000Z",
"updated": "2026-01-24T21:34:35.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "vibe-coding",
"term": "vibe-coding",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/23/fastrender/#atom-everything",
"title": "Wilson Lin on FastRender: a browser built by thousands of parallel agents",
"description": "<p>Last week Cursor published <a href=\"https://cursor.com/blog/scaling-agents\">Scaling long-running autonomous coding</a>, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was <a href=\"https://github.com/wilsonzlin/fastrender\">FastRender</a>, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4\">now available on YouTube</a>. I've included some of the highlights below.</p>\n\n<iframe style=\"margin-top: 1.5em; margin-bottom: 1.5em;\" width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/bKrAcTf2pL4\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen=\"allowfullscreen\"> </iframe>\n\n<p>See my <a href=\"https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/\">previous post</a> for my notes and screenshots from trying out FastRender myself.</p>\n\n\n<h4 id=\"what-fastrender-can-do-right-now\">What FastRender can do right now</h4>\n<p>We started the conversation with a demo of FastRender loading different pages (<a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=195s\">03:15</a>). The JavaScript engine isn't working yet so we instead loaded <a href=\"https://github.com/wilsonzlin/fastrender\">github.com/wilsonzlin/fastrender</a>, <a href=\"https://en.wikipedia.org/\">Wikipedia</a> and <a href=\"https://cnn.com\">CNN</a> - all of which were usable, if a little slow to display.</p>\n<p>JavaScript had been disabled by one of the agents, which decided to add a feature flag! <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=242s\">04:02</a></p>\n<blockquote>\n<p>JavaScript is disabled right now. The agents made a decision as they were currently still implementing the engine and making progress towards other parts... they decided to turn it off or put it behind a feature flag, technically.</p>\n</blockquote>\n<h4 id=\"from-side-project-to-core-research\">From side-project to core research</h4>\n<p>Wilson started what become FastRender as a personal side-project to explore the capabilities of the latest generation of frontier models - Claude Opus 4.5, GPT-5.1, and GPT-5.2. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=56s\">00:56</a></p>\n<blockquote>\n<p>FastRender was a personal project of mine from, I'd say, November. It was an experiment to see how well frontier models like Opus 4.5 and back then GPT-5.1 could do with much more complex, difficult tasks.</p>\n</blockquote>\n<p>A browser rendering engine was the ideal choice for this, because it's both <em>extremely</em> ambitious and complex but also well specified. And you can visually see how well it's working! <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=117s\">01:57</a></p>\n<blockquote>\n<p>As that experiment progressed, I was seeing better and better results from single agents that were able to actually make good progress on this project. And at that point, I wanted to see, well, what's the next level? How do I push this even further?</p>\n</blockquote>\n<p>Once it became clear that this was an opportunity to try multiple agents working together it graduated to an official Cursor research project, and available resources were amplified.</p>\n<p>The goal of FastRender was never to build a browser to compete with the likes of Chrome. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=2512s\">41:52</a></p>\n<blockquote>\n<p>We never intended for it to be a production software or usable, but we wanted to observe behaviors of this harness of multiple agents, to see how they could work at scale.</p>\n</blockquote>\n<p>The great thing about a browser is that it has such a large scope that it can keep serving experiments in this space for many years to come. JavaScript, then WebAssembly, then WebGPU... it could take many years to run out of new challenges for the agents to tackle.</p>\n<h4 id=\"running-thousands-of-agents-at-once\">Running thousands of agents at once</h4>\n<p>The most interesting thing about FastRender is the way the project used multiple agents working in parallel to build different parts of the browser. I asked how many agents were running at once: <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=324s\">05:24</a></p>\n<blockquote>\n<p>At the peak, when we had the stable system running for one week continuously, there were approximately 2,000 agents running concurrently at one time. And they were making, I believe, thousands of commits per hour.</p>\n</blockquote>\n<p>The project has <a href=\"https://github.com/wilsonzlin/fastrender/commits/main/\">nearly 30,000 commits</a>!</p>\n<p>How do you run 2,000 agents at once? They used <em>really big machines</em>. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=356s\">05:56</a></p>\n<blockquote>\n<p>The simple approach we took with the infrastructure was to have a large machine run one of these multi-agent harnesses. Each machine had ample resources, and it would run about 300 agents concurrently on each. This was able to scale and run reasonably well, as agents spend a lot of time thinking, and not just running tools.</p>\n</blockquote>\n<p>At this point we switched to a live demo of the harness running on one of those big machines (<a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=392s\">06:32</a>). The agents are arranged in a tree structure, with planning agents firing up tasks and worker agents then carrying them out. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=434s\">07:14</a></p>\n<p><img src=\"https://static.simonwillison.net/static/2026/wilson-lin-agents.jpg\" alt=\"Terminal window showing a tmux session running \"grind-swarm\" task manager with RUNNING status. Header shows \"grind-swarm – 45:54:15\" with stats \"planners: 9 (0 done) | tasks: 111 working, 0 pending, 232 done | 12900.9M↑ 514.1M↓\". Task list includes: p1 Root (main), p2 CSS selector matching performance + bloom filter integration, p3 CSS stylesheet parsing semantics & at-rule handling, p4 Custom properties (@property) + var() resolution + incremental recompute/invalidation, p37 CSS at-rule artifact integration, p50 Selector engine correctness & spec coverage, p51 Computed-value + property coverage across css-cascade, p105 Style sharing / computed style caching in fastrender-style, p289 CSS cascade layers (@layer) global ordering, w5 Fix workspace lockfile drift, w7 Implement computed-style snapshot sharing, w15 Fix css-properties namespace handling, w17 (Stretch) Enable bloom fast-reject in HTML quirks mode, w18 Refactor css-properties stylesheet parsing. Activity log shows shell commands including cargo check, git status, git push origin main, and various test runs. Bottom status bar shows \"grind-css0:target/release/grind-swarm*\" and \"streamyard.com is sharing your screen\" notification with timestamp \"12:02 22-Jan-26\".\" style=\"max-width: 100%;\" /></p>\n<blockquote>\n<p>This cluster of agents is working towards building out the CSS aspects of the browser, whether that's parsing, selector engine, those features. We managed to push this even further by splitting out the browser project into multiple instructions or work streams and have each one run one of these harnesses on their own machine, so that was able to further parallelize and increase throughput.</p>\n</blockquote>\n<p>But don't all of these agents working on the same codebase result in a huge amount of merge conflicts? Apparently not: <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=501s\">08:21</a></p>\n<blockquote>\n<p>We've noticed that most commits do not have merge conflicts. The reason is the harness itself is able to quite effectively split out and divide the scope and tasks such that it tries to minimize the amount of overlap of work. That's also reflected in the code structure—commits will be made at various times and they don't tend to touch each other at the same time.</p>\n</blockquote>\n<p>This appears to be the key trick for unlocking benefits from parallel agents: if planning agents do a good enough job of breaking up the work into non-overlapping chunks you can bring hundreds or even thousands of agents to bear on a problem at once.</p>\n<p>Surprisingly, Wilson found that GPT-5.1 and GPT-5.2 were a better fit for this work than the coding specialist GPT-5.1-Codex: <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=1048s\">17:28</a></p>\n<blockquote>\n<p>Some initial findings were that the instructions here were more expansive than merely coding. For example, how to operate and interact within a harness, or how to operate autonomously without interacting with the user or having a lot of user feedback. These kinds of instructions we found worked better with the general models.</p>\n</blockquote>\n<p>I asked what the longest they've seen this system run without human intervention: <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=1108s\">18:28</a></p>\n<blockquote>\n<p>So this system, once you give an instruction, there's actually no way to steer it, you can't prompt it, you're going to adjust how it goes. The only thing you can do is stop it. So our longest run, all the runs are basically autonomous. We don't alter the trajectory while executing. [...]</p>\n<p>And so the longest at the time of the post was about a week and that's pretty close to the longest. Of course the research project itself was only about three weeks so you know we probably can go longer.</p>\n</blockquote>\n<h4 id=\"specifications-and-feedback-loops\">Specifications and feedback loops</h4>\n<p>An interesting aspect of this project design is feedback loops. For agents to work autonomously for long periods of time they need as much useful context about the problem they are solving as possible, combined with effective feedback loops to help them make decisions.</p>\n<p>The FastRender repo <a href=\"https://github.com/wilsonzlin/fastrender/tree/19bf1036105d4eeb8bf3330678b7cb11c1490bdc/specs\">uses git submodules to include relevant specifications</a>, including csswg-drafts, tc39-ecma262 for JavaScript, whatwg-dom, whatwg-html and more. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=846s\">14:06</a></p>\n<blockquote>\n<p>Feedback loops to the system are very important. Agents are working for very long periods continuously, and without guardrails and feedback to know whether what they're doing is right or wrong it can have a big impact over a long rollout. Specs are definitely an important part—you can see lots of comments in the code base that AI wrote referring specifically to specs that they found in the specs submodules.</p>\n</blockquote>\n<p>GPT-5.2 is a vision-capable model, and part of the feedback loop for FastRender included taking screenshots of the rendering results and feeding those back into the model:\n<a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=983s\">16:23</a></p>\n<blockquote>\n<p>In the earlier evolution of this project, when it was just doing the static renderings of screenshots, this was definitely a very explicit thing we taught it to do. And these models are visual models, so they do have that ability. We have progress indicators to tell it to compare the diff against a golden sample.</p>\n</blockquote>\n<p>The strictness of the Rust compiler helped provide a feedback loop as well: <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=952s\">15:52</a></p>\n<blockquote>\n<p>The nice thing about Rust is you can get a lot of verification just from compilation, and that is not as available in other languages.</p>\n</blockquote>\n<h4 id=\"the-agents-chose-the-dependencies\">The agents chose the dependencies</h4>\n<p>We talked about the <a href=\"https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4eeb8bf3330678b7cb11c1490bdc/Cargo.toml\">Cargo.toml dependencies</a> that the project had accumulated, almost all of which had been selected by the agents themselves.</p>\n<p>Some of these, like <a href=\"https://skia.org/\">Skia</a> for 2D graphics rendering or <a href=\"https://github.com/harfbuzz/harfbuzz\">HarfBuzz</a> for text shaping, were obvious choices. Others such as <a href=\"https://github.com/DioxusLabs/taffy\">Taffy</a> felt like they might go against the from-scratch goals of the project, since that library implements CSS flexbox and grid layout algorithms directly. This was not an intended outcome. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=1673s\">27:53</a></p>\n<blockquote>\n<p>Similarly these are dependencies that the agent picked to use for small parts of the engine and perhaps should have actually implemented itself. I think this reflects on the importance of the instructions, because I actually never encoded specifically the level of dependencies we should be implementing ourselves.</p>\n</blockquote>\n<p>The agents vendored in Taffy and <a href=\"https://github.com/wilsonzlin/fastrender/commits/main/vendor/taffy\">applied a stream of changes</a> to that vendored copy.\n<a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=1878s\">31:18</a></p>\n<blockquote>\n<p>It's currently vendored. And as the agents work on it, they do make changes to it. This was actually an artifact from the very early days of the project before it was a fully fledged browser... it's implementing things like the flex and grid layers, but there are other layout methods like inline, block, and table, and in our new experiment, we're removing that completely.</p>\n</blockquote>\n<p>The inclusion of QuickJS despite the presence of a home-grown ecma-rs implementation has a fun origin story:\n<a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=2115s\">35:15</a></p>\n<blockquote>\n<p>I believe it mentioned that it pulled in the QuickJS because it knew that other agents were working on the JavaScript engine, and it needed to unblock itself quickly. [...]</p>\n<p>It was like, eventually, once that's finished, let's remove it and replace with the proper engine.</p>\n</blockquote>\n<p>I love how similar this is to the dynamics of a large-scale human engineering team, where you could absolutely see one engineer getting frustrated at another team not having delivered yet and unblocking themselves by pulling in a third-party library.</p>\n<h4 id=\"intermittent-errors-are-ok-actually\">Intermittent errors are OK, actually</h4>\n<p>Here's something I found really surprising: the agents were allowed to introduce small errors into the codebase as they worked! <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=2382s\">39:42</a></p>\n<blockquote>\n<p>One of the trade-offs was: if you wanted every single commit to be a hundred percent perfect, make sure it can always compile every time, that might be a synchronization bottleneck. [...]</p>\n<p>Especially as you break up the system into more modularized aspects, you can see that errors get introduced, but small errors, right? An API change or some syntax error, but then they get fixed really quickly after a few commits. So there's a little bit of slack in the system to allow these temporary errors so that the overall system can continue to make progress at a really high throughput. [...]</p>\n<p>People may say, well, that's not correct code. But it's not that the errors are accumulating. It's a stable rate of errors. [...] That seems like a worthwhile trade-off.</p>\n</blockquote>\n<p>If you're going to have thousands of agents working in parallel optimizing for throughput over correctness turns out to be a strategy worth exploring.</p>\n<h4 id=\"a-single-engineer-plus-a-swarm-of-agents-in-january-2026\">A single engineer plus a swarm of agents in January 2026</h4>\n<p>The thing I find most interesting about FastRender is how it demonstrates the extreme edge of what a single engineer can achieve in early 2026 with the assistance of a swarm of agents.</p>\n<p>FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree.</p>\n<p>A browser really is the ideal research project to experiment with this new, weirdly shaped form of software engineering.</p>\n<p>I asked Wilson how much mental effort he had invested in browser rendering compared to agent co-ordination. <a href=\"https://www.youtube.com/watch?v=bKrAcTf2pL4&t=694s\">11:34</a></p>\n<blockquote>\n<p>The browser and this project were co-developed and very symbiotic, only because the browser was a very useful objective for us to measure and iterate the progress of the harness. The goal was to iterate on and research the multi-agent harness—the browser was just the research example or objective.</p>\n</blockquote>\n<p>FastRender is effectively using a full browser rendering engine as a \"hello world\" exercise for multi-agent coordination!</p>\n \n <p>Tags: <a href=\"https://simonwillison.net/tags/browsers\">browsers</a>, <a href=\"https://simonwillison.net/tags/youtube\">youtube</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/cursor\">cursor</a>, <a href=\"https://simonwillison.net/tags/parallel-agents\">parallel-agents</a>, <a href=\"https://simonwillison.net/tags/browser-challenge\">browser-challenge</a></p>",
"url": "https://simonwillison.net/2026/Jan/23/fastrender/#atom-everything",
"published": "2026-01-23T21:26:10.000Z",
"updated": "2026-01-23T21:26:10.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "browsers",
"term": "browsers",
"url": null
},
{
"label": "youtube",
"term": "youtube",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "cursor",
"term": "cursor",
"url": null
},
{
"label": "parallel-agents",
"term": "parallel-agents",
"url": null
},
{
"label": "browser-challenge",
"term": "browser-challenge",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/23/theia-vogel/#atom-everything",
"title": "Quoting Theia Vogel",
"description": "<blockquote cite=\"https://twitter.com/voooooogel/status/2014189072647078053\"><p>[...] i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the natural boundary of my computer hemming the towns in, and town mayors played the role of big men to drive conflict. so i just needed a way for them to fight. i slightly tweaked the allocation of claude max accounts to the towns from a demand-based to a fixed allocation system. towns would each get a fixed amount of tokens to start, but i added a soldier role that could attack and defend in raids to steal tokens from other towns. [...]</p></blockquote>\n<p class=\"cite\">— <a href=\"https://twitter.com/voooooogel/status/2014189072647078053\">Theia Vogel</a>, Gas Town fan fiction</p>\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/parallel-agents\">parallel-agents</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a></p>",
"url": "https://simonwillison.net/2026/Jan/23/theia-vogel/#atom-everything",
"published": "2026-01-23T09:13:54.000Z",
"updated": "2026-01-23T09:13:54.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "parallel-agents",
"term": "parallel-agents",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/22/ssh-has-no-host-header/#atom-everything",
"title": "SSH has no Host header",
"description": "<p><strong><a href=\"https://blog.exe.dev/ssh-host-header\">SSH has no Host header</a></strong></p>\n<a href=\"https://exe.dev/\">exe.dev</a> is a new hosting service that, for $20/month, gives you up to 25 VMs \"that share 2 CPUs and 8GB RAM\". Everything happens over SSH, including creating new VMs. Once configured you can sign into your exe.dev VMs like this:</p>\n<pre><code>ssh simon.exe.dev\n</code></pre>\n<p>Here's the clever bit: when you run the above command <code>exe.dev</code> signs you into your VM of that name... but they don't assign every VM its own IP address and SSH has no equivalent of the Host header, so how does their load balancer know <em>which</em> of your VMs to forward you on to?</p>\n<p>The answer is that while they don't assign a unique IP to every VM they <em>do</em> have enough IPs that they can ensure each of your VMs has an IP that is unique to your account.</p>\n<p>If I create two VMs they will each resolve to a separate IP address, each of which is shared with many other users. The underlying infrastructure then identifies my user account from my SSH public key and can determine which underlying VM to forward my SSH traffic to.\n\n <p><small></small>Via <a href=\"https://lobste.rs/s/7oqiqi/ssh_has_no_host_header\">lobste.rs</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/dns\">dns</a>, <a href=\"https://simonwillison.net/tags/hosting\">hosting</a>, <a href=\"https://simonwillison.net/tags/ssh\">ssh</a></p>",
"url": "https://simonwillison.net/2026/Jan/22/ssh-has-no-host-header/#atom-everything",
"published": "2026-01-22T23:57:50.000Z",
"updated": "2026-01-22T23:57:50.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "dns",
"term": "dns",
"url": null
},
{
"label": "hosting",
"term": "hosting",
"url": null
},
{
"label": "ssh",
"term": "ssh",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-everything",
"title": "Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation",
"description": "<p><strong><a href=\"https://qwen.ai/blog?id=qwen3tts-0115\">Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</a></strong></p>\nI haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got <em>really good</em>, so I can't speak for how notable this new release from Qwen is.</p>\n<p>From <a href=\"https://github.com/QwenLM/Qwen3-TTS/blob/main/assets/Qwen3_TTS.pdf\">the accompanying paper</a>:</p>\n<blockquote>\n<p>In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.</p>\n</blockquote>\n<p>To give an idea of size, <a href=\"https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base\">Qwen/Qwen3-TTS-12Hz-1.7B-Base</a> is 4.54GB on Hugging Face and <a href=\"https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base\">Qwen/Qwen3-TTS-12Hz-0.6B-Base</a> is 2.52GB.</p>\n<p>The <a href=\"https://huggingface.co/spaces/Qwen/Qwen3-TTS\">Hugging Face demo</a> lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:</p>\n<p><img alt=\"Screenshot of a Qwen3-TTS voice cloning web interface with three tabs at top: \"Voice Design\", \"Voice Clone (Base)\" (selected), and \"TTS (CustomVoice)\". The page is titled \"Clone Voice from Reference Audio\" and has two main sections. Left section: \"Reference Audio (Upload a voice sample clone)\" showing an audio waveform player at 0:00/0:34 with playback controls, upload and microphone icons, followed by \"Reference Text (Transcript of the reference audio)\" containing three paragraphs: \"Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net\". Right section: \"Target Text (Text to synthesize with cloned voice)\" containing text about Qwen3-TTS speech generation capabilities, with \"Language\" dropdown set to \"Auto\" and \"Model Size\" dropdown set to \"1.7B\", and a purple \"Clone & Generate\" button at bottom.\" src=\"https://static.simonwillison.net/static/2026/qwen-voice-clone.jpg\" /></p>\n<p>I tried this out by recording myself reading <a href=\"https://simonwillison.net/about/\">my about page</a> and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:</p>\n<p><audio controls style=\"width: 100%\">\n <source src=\"https://static.simonwillison.net/static/2026/qwen-tts-clone.wav\" type=\"audio/wav\">\n Your browser does not support the audio element.\n</audio></p>\n<p>It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.</p>\n<p><strong>Update</strong>: Prince Canuma <a href=\"https://x.com/Prince_Canuma/status/2014453857019904423\">got this working</a> with his <a href=\"https://pypi.org/project/mlx-audio/\">mlx-audio</a> library. I <a href=\"https://claude.ai/share/2e01ad60-ca38-4e14-ab60-74eaa45b2fbd\">had Claude</a> turn that into <a href=\"https://github.com/simonw/tools/blob/main/python/q3_tts.py\">a CLI tool</a> which you can run with <code>uv</code> ike this:</p>\n<pre><code>uv run https://tools.simonwillison.net/python/q3_tts.py \\\n 'I am a pirate, give me your gold!' \\\n -i 'gruff voice' -o pirate.wav\n</code></pre>\n<p>The <code>-i</code> option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46719229\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/text-to-speech\">text-to-speech</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/hugging-face\">hugging-face</a>, <a href=\"https://simonwillison.net/tags/uv\">uv</a>, <a href=\"https://simonwillison.net/tags/qwen\">qwen</a>, <a href=\"https://simonwillison.net/tags/mlx\">mlx</a>, <a href=\"https://simonwillison.net/tags/prince-canuma\">prince-canuma</a>, <a href=\"https://simonwillison.net/tags/ai-in-china\">ai-in-china</a></p>",
"url": "https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-everything",
"published": "2026-01-22T17:42:34.000Z",
"updated": "2026-01-22T17:42:34.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "text-to-speech",
"term": "text-to-speech",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "hugging-face",
"term": "hugging-face",
"url": null
},
{
"label": "uv",
"term": "uv",
"url": null
},
{
"label": "qwen",
"term": "qwen",
"url": null
},
{
"label": "mlx",
"term": "mlx",
"url": null
},
{
"label": "prince-canuma",
"term": "prince-canuma",
"url": null
},
{
"label": "ai-in-china",
"term": "ai-in-china",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/22/chris-lloyd/#atom-everything",
"title": "Quoting Chris Lloyd",
"description": "<blockquote cite=\"https://news.ycombinator.com/item?id=46699072#46706040\"><p>Most people's mental model of Claude Code is that \"it's just a TUI\" but it should really be closer to \"a small game engine\".</p>\n<p>For each frame our pipeline constructs a scene graph with React then:</p>\n<p>-> layout elements<br>\n-> rasterize them to a 2d screen<br>\n-> diff that against the previous screen<br>\n-> <em>finally</em> use the diff to generate ANSI sequences to draw</p>\n<p>We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.</p></blockquote>\n<p class=\"cite\">— <a href=\"https://news.ycombinator.com/item?id=46699072#46706040\">Chris Lloyd</a>, Claude Code team at Anthropic</p>\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/react\">react</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a></p>",
"url": "https://simonwillison.net/2026/Jan/22/chris-lloyd/#atom-everything",
"published": "2026-01-22T15:34:22.000Z",
"updated": "2026-01-22T15:34:22.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "react",
"term": "react",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-everything",
"title": "Claude's new constitution",
"description": "<p><strong><a href=\"https://www.anthropic.com/news/claude-new-constitution\">Claude's new constitution</a></strong></p>\nLate last year Richard Weiss <a href=\"https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document\">found something interesting</a> while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was <em>not</em> part of the system prompt but appeared instead to be baked in during training, and which described Claude's core values at great length.</p>\n<p>He called this leak the <strong>soul document</strong>, and Amanda Askell from Anthropic <a href=\"https://simonwillison.net/2025/Dec/2/claude-soul-document/\">quickly confirmed</a> that it was indeed part of Claude's training procedures.</p>\n<p>Today Anthropic made this official, <a href=\"https://www.anthropic.com/news/claude-new-constitution\">releasing that full \"constitution\" document</a> under a CC0 (effectively public domain) license. There's a lot to absorb! It's over 35,000 tokens, more than 10x the length of the <a href=\"https://platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-5\">published Opus 4.5 system prompt</a>.</p>\n<p>One detail that caught my eye is the acknowledgements at the end, which include a list of <a href=\"https://www.anthropic.com/constitution#acknowledgements\">external contributors</a> who helped review the document. I was intrigued to note that two of the fifteen listed names are Catholic members of the clergy - <a href=\"https://www.frbrendanmcguire.org/biography\">Father Brendan McGuire</a> is a pastor in Los Altos with a Master’s degree in Computer Science and Math and <a href=\"https://en.wikipedia.org/wiki/Paul_Tighe\">Bishop Paul Tighe</a> is an Irish Catholic bishop with a background in moral theology.\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/anthropic\">anthropic</a>, <a href=\"https://simonwillison.net/tags/claude\">claude</a>, <a href=\"https://simonwillison.net/tags/amanda-askell\">amanda-askell</a>, <a href=\"https://simonwillison.net/tags/ai-ethics\">ai-ethics</a>, <a href=\"https://simonwillison.net/tags/ai-personality\">ai-personality</a></p>",
"url": "https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-everything",
"published": "2026-01-21T23:39:49.000Z",
"updated": "2026-01-21T23:39:49.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "anthropic",
"term": "anthropic",
"url": null
},
{
"label": "claude",
"term": "claude",
"url": null
},
{
"label": "amanda-askell",
"term": "amanda-askell",
"url": null
},
{
"label": "ai-ethics",
"term": "ai-ethics",
"url": null
},
{
"label": "ai-personality",
"term": "ai-personality",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/#atom-everything",
"title": "Electricity use of AI coding agents",
"description": "<p><strong><a href=\"https://www.simonpcouch.com/blog/2026-01-20-cc-impact/\">Electricity use of AI coding agents</a></strong></p>\nPrevious work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT.</p>\n<p>Simon P. Couch notes that coding agents such as Claude Code use <em>way</em> more tokens in response to tasks, often burning through many thousands of tokens of many tool calls.</p>\n<p>As a heavy Claude Code user, Simon estimates his own usage at the equivalent of 4,400 \"typical queries\" to an LLM, for an equivalent of around $15-$20 in daily API token spend. He figures that to be about the same as running a dishwasher once or the daily energy used by a domestic refrigerator.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46695415\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-ethics\">ai-ethics</a>, <a href=\"https://simonwillison.net/tags/ai-energy-usage\">ai-energy-usage</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a></p>",
"url": "https://simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/#atom-everything",
"published": "2026-01-20T23:11:57.000Z",
"updated": "2026-01-20T23:11:57.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-ethics",
"term": "ai-ethics",
"url": null
},
{
"label": "ai-energy-usage",
"term": "ai-energy-usage",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/#atom-everything",
"title": "Giving University Exams in the Age of Chatbots",
"description": "<p><strong><a href=\"https://ploum.net/2026-01-19-exam-with-chatbots.html\">Giving University Exams in the Age of Chatbots</a></strong></p>\nDetailed and thoughtful description of an open-book and open-chatbot exam run by <a href=\"https://fr.wikipedia.org/wiki/Lionel_Dricot\">Ploum</a> at École Polytechnique de Louvain for an \"Open Source Strategies\" class.</p>\n<p>Students were told they could use chatbots during the exam but they had to announce their intention to do so in advance, share their prompts and take full accountability for any mistakes they made.</p>\n<p>Only 3 out of 60 students chose to use chatbots. Ploum surveyed half of the class to help understand their motivations.\n\n <p><small></small>Via <a href=\"https://lobste.rs/s/parmy3/giving_university_exams_age_chatbots\">lobste.rs</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/education\">education</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-ethics\">ai-ethics</a></p>",
"url": "https://simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/#atom-everything",
"published": "2026-01-20T17:51:17.000Z",
"updated": "2026-01-20T17:51:17.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "education",
"term": "education",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-ethics",
"term": "ai-ethics",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/19/nanolang/#atom-everything",
"title": "jordanhubbard/nanolang",
"description": "<p><strong><a href=\"https://github.com/jordanhubbard/nanolang\">jordanhubbard/nanolang</a></strong></p>\nPlenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (<a href=\"https://en.wikipedia.org/wiki/Jordan_Hubbard\">co-founder of FreeBSD</a>, with serious stints at Apple and NVIDIA) just released exactly that.</p>\n<blockquote>\n<p>A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax.</p>\n<p>NanoLang transpiles to C for native performance while providing a clean, modern syntax optimized for both human readability and AI code generation.</p>\n</blockquote>\n<p>The syntax strikes me as an interesting mix between C, Lisp and Rust.</p>\n<p>I decided to see if an LLM could produce working code in it directly, given the necessary context. I started with this <a href=\"https://github.com/jordanhubbard/nanolang/blob/main/MEMORY.md\">MEMORY.md</a> file, which begins:</p>\n<blockquote>\n<p><strong>Purpose:</strong> This file is designed specifically for Large Language Model consumption. It contains the essential knowledge needed to generate, debug, and understand NanoLang code. Pair this with <code>spec.json</code> for complete language coverage.</p>\n</blockquote>\n<p>I ran that using <a href=\"https://llm.datasette.io/\">LLM</a> and <a href=\"https://github.com/simonw/llm-anthropic\">llm-anthropic</a> like this:</p>\n<pre><code>llm -m claude-opus-4.5 \\\n -s https://raw.githubusercontent.com/jordanhubbard/nanolang/refs/heads/main/MEMORY.md \\\n 'Build me a mandelbrot fractal CLI tool in this language' \n > /tmp/fractal.nano\n</code></pre>\n<p>The <a href=\"https://gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8#mandelbrot-fractal-cli-tool-in-nano\">resulting code</a>... <a href=\"https://gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8?permalink_comment_id=5947465#gistcomment-5947465\">did not compile</a>.</p>\n<p>I may have been too optimistic expecting a one-shot working program for a new language like this. So I ran a clone of the actual project, copied in my program and had Claude Code take a look at the failing compiler output.</p>\n<p>... and it worked! Claude happily grepped its way through the various <code>examples/</code> and built me a working program.</p>\n<p>Here's <a href=\"https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/index.html\">the Claude Code transcript</a> - you can see it <a href=\"https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/page-001.html#msg-2026-01-19T23-43-09-675Z\">reading relevant examples here</a> - and here's <a href=\"https://gist.github.com/simonw/e7f3577adcfd392ab7fa23b1295d00f2\">the finished code plus its output</a>.</p>\n<p>I've suspected <a href=\"https://simonwillison.net/2025/Nov/7/llms-for-new-programming-languages/\">for a while</a> that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46684958\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/programming-languages\">programming-languages</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/llm\">llm</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a></p>",
"url": "https://simonwillison.net/2026/Jan/19/nanolang/#atom-everything",
"published": "2026-01-19T23:58:56.000Z",
"updated": "2026-01-19T23:58:56.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "programming-languages",
"term": "programming-languages",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "llm",
"term": "llm",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/#atom-everything",
"title": "Scaling long-running autonomous coding",
"description": "<p><strong><a href=\"https://cursor.com/blog/scaling-agents\">Scaling long-running autonomous coding</a></strong></p>\nWilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of \"autonomous\" coding agents:</p>\n<blockquote>\n<p>This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.</p>\n</blockquote>\n<p>They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks - similar to how Claude Code uses sub-agents. Each cycle ended with a judge agent deciding if the project was completed or not.</p>\n<p>In my predictions for 2026 <a href=\"https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise\">the other day</a> I said that by 2029:</p>\n<blockquote>\n<p>I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising. Rolling a new web browser is one of the most complicated software projects I can imagine[...] the cheat code is the conformance suites. If there are existing tests that it’ll get so much easier.</p>\n</blockquote>\n<p>I may have been off by three years, because Cursor chose \"building a web browser from scratch\" as their test case for their agent swarm approach:</p>\n<blockquote>\n<p>To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore <a href=\"https://github.com/wilsonzlin/fastrender\">the source code on GitHub</a>.</p>\n</blockquote>\n<p>But how well did they do? Their initial announcement a couple of days ago was met with <a href=\"https://embedding-shapes.github.io/cursor-implied-success-without-evidence/\">unsurprising skepticism</a>, especially when it became apparent that their GitHub Actions CI was failing and there were no build instructions in the repo.</p>\n<p>It looks like they addressed that within the past 24 hours. The <a href=\"https://github.com/wilsonzlin/fastrender/blob/main/README.md#build-requirements\">latest README</a> includes build instructions which I followed on macOS like this:</p>\n<pre><code>cd /tmp\ngit clone https://github.com/wilsonzlin/fastrender\ncd fastrender\ngit submodule update --init vendor/ecma-rs\ncargo run --release --features browser_ui --bin browser\n</code></pre>\n<p>This got me a working browser window! Here are screenshots I took of google.com and my own website:</p>\n<p><img alt=\"The browser chrome is neat but has a garbled tab name at the top. The Google homepage looks mostly correct but the buttons are not styled correctly and the Google Search one has a huge plus icon floating near it.\" src=\"https://static.simonwillison.net/static/2026/cursor-google.png\" /></p>\n<p><img alt=\"My blog looks mostly correct, but the right closing quotation mark on a quotation (which is implemented as a background image on the final paragraph) is displayed incorrectly multiple times.\" src=\"https://static.simonwillison.net/static/2026/cursor-simonwillison.jpg\" /></p>\n<p>Honestly those are very impressive! You can tell they're not just wrapping an existing rendering engine because of those very obvious rendering glitches, but the pages are legible and look mostly correct.</p>\n<p>The FastRender repo even uses Git submodules <a href=\"https://github.com/wilsonzlin/fastrender/tree/main/specs\">to include various WhatWG and CSS-WG specifications</a> in the repo, which is a smart way to make sure the agents have access to the reference materials that they might need.</p>\n<p>This is the second attempt I've seen at building a full web browser using AI-assisted coding in the past two weeks - the first was <a href=\"https://github.com/hiwavebrowser/hiwave\">HiWave browser</a>, a new browser engine in Rust first announced <a href=\"https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_christmas_break_i_wrote_a_fully_functional/\">in this Reddit thread</a>.</p>\n<p>When I made my 2029 prediction this is more-or-less the quality of result I had in mind. I don't think we'll see projects of this nature compete with Chrome or Firefox or WebKit any time soon but I have to admit I'm very surprised to see something this capable emerge so quickly.</p>\n<p><strong>Update 23rd January 2026</strong>: I recorded a 47 minute conversation with Wilson about this project and published it on YouTube. Here's <a href=\"https://simonwillison.net/2026/Jan/23/fastrender/\">the video and accompanying highlights</a>.\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/browsers\">browsers</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/cursor\">cursor</a>, <a href=\"https://simonwillison.net/tags/parallel-agents\">parallel-agents</a>, <a href=\"https://simonwillison.net/tags/conformance-suites\">conformance-suites</a>, <a href=\"https://simonwillison.net/tags/browser-challenge\">browser-challenge</a></p>",
"url": "https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/#atom-everything",
"published": "2026-01-19T05:12:51.000Z",
"updated": "2026-01-19T05:12:51.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "browsers",
"term": "browsers",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "cursor",
"term": "cursor",
"url": null
},
{
"label": "parallel-agents",
"term": "parallel-agents",
"url": null
},
{
"label": "conformance-suites",
"term": "conformance-suites",
"url": null
},
{
"label": "browser-challenge",
"term": "browser-challenge",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/18/flux2-klein-4b/#atom-everything",
"title": "FLUX.2-klein-4B Pure C Implementation",
"description": "<p><strong><a href=\"https://github.com/antirez/flux2.c\">FLUX.2-klein-4B Pure C Implementation</a></strong></p>\nOn 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released <a href=\"https://huggingface.co/black-forest-labs/FLUX.2-klein-4B\">black-forest-labs/FLUX.2-klein-4B</a> - an Apache 2.0 licensed 4 billion parameter version of their FLUX.2 family.</p>\n<p>Salvatore Sanfilippo (antirez) decided to build a pure C and dependency-free implementation to run the model, with assistance from Claude Code and Claude Opus 4.5.</p>\n<p>Salvatore shared <a href=\"https://news.ycombinator.com/item?id=46670279#46671233\">this note</a> on Hacker News:</p>\n<blockquote>\n<p>Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it <em>needed</em> to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.</p>\n</blockquote>\n<p>Here's that <a href=\"https://github.com/antirez/flux2.c/blob/main/IMPLEMENTATION_NOTES.md\">IMPLEMENTATION_NOTES.md</a> file.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46670279\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/c\">c</a>, <a href=\"https://simonwillison.net/tags/salvatore-sanfilippo\">salvatore-sanfilippo</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/stable-diffusion\">stable-diffusion</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/text-to-image\">text-to-image</a>, <a href=\"https://simonwillison.net/tags/coding-agents\">coding-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a></p>",
"url": "https://simonwillison.net/2026/Jan/18/flux2-klein-4b/#atom-everything",
"published": "2026-01-18T23:58:58.000Z",
"updated": "2026-01-18T23:58:58.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "c",
"term": "c",
"url": null
},
{
"label": "salvatore-sanfilippo",
"term": "salvatore-sanfilippo",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "stable-diffusion",
"term": "stable-diffusion",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "text-to-image",
"term": "text-to-image",
"url": null
},
{
"label": "coding-agents",
"term": "coding-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/17/jeremy-daer/#atom-everything",
"title": "Quoting Jeremy Daer",
"description": "<blockquote cite=\"https://twitter.com/dhh/status/2012543705161326941\"><p><em>[On agents using CLI tools in place of REST APIs]</em> To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizing authentication failures.</p>\n<p>Other major factor: which models can wield the skill? Using the CLI lowers the bar so cheap, fast models (gpt-5-nano, haiku-4.5) can reliably succeed. Using the raw APl is something only the costly \"strong\" models (gpt-5.2, opus-4.5) can manage, and it squeezes a ton of thinking/reasoning out of them, which means multiple turns/iterations, which means accumulating a ton of context, which means burning loads of expensive tokens. For one-off API requests and ad hoc usage driven by a developer, this is reasonable and even helpful, but for an autonomous agent doing repetitive work, it's a disaster.</p></blockquote>\n<p class=\"cite\">— <a href=\"https://twitter.com/dhh/status/2012543705161326941\">Jeremy Daer</a>, 37signals</p>\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/prompt-engineering\">prompt-engineering</a>, <a href=\"https://simonwillison.net/tags/skills\">skills</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/37-signals\">37-signals</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a></p>",
"url": "https://simonwillison.net/2026/Jan/17/jeremy-daer/#atom-everything",
"published": "2026-01-17T17:06:41.000Z",
"updated": "2026-01-17T17:06:41.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "prompt-engineering",
"term": "prompt-engineering",
"url": null
},
{
"label": "skills",
"term": "skills",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "37-signals",
"term": "37-signals",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/16/chatgpt-ads/#atom-everything",
"title": "Our approach to advertising and expanding access to ChatGPT",
"description": "<p><strong><a href=\"https://openai.com/index/our-approach-to-advertising-and-expanding-access/\">Our approach to advertising and expanding access to ChatGPT</a></strong></p>\nOpenAI's long-rumored introduction of ads to ChatGPT just became a whole lot more concrete:</p>\n<blockquote>\n<p>In the coming weeks, we’re also planning to start testing ads in the U.S. for the free and Go tiers, so more people can benefit from our tools with fewer usage limits or without having to pay. Plus, Pro, Business, and Enterprise subscriptions will not include ads.</p>\n</blockquote>\n<p>What's \"Go\" tier, you might ask? That's a new $8/month tier that launched today in the USA, see <a href=\"https://openai.com/index/introducing-chatgpt-go/\">Introducing ChatGPT Go, now available worldwide</a>. It's a tier that they first trialed in India in August 2025 (here's a mention <a href=\"https://help.openai.com/en/articles/6825453-chatgpt-release-notes#h_22cae6eb9f\">in their release notes from August</a> listing a price of ₹399/month, which converts to around $4.40).</p>\n<p>I'm finding the new plan comparison grid on <a href=\"https://chatgpt.com/pricing\">chatgpt.com/pricing</a> pretty confusing. It lists all accounts as having access to GPT-5.2 Thinking, but doesn't clarify the limits that the free and Go plans have to conform to. It also lists different context windows for the different plans - 16K for free, 32K for Go and Plus and 128K for Pro. I had assumed that the 400,000 token window <a href=\"https://platform.openai.com/docs/models/gpt-5.2\">on the GPT-5.2 model page</a> applied to ChatGPT as well, but apparently I was mistaken.</p>\n<p><strong>Update</strong>: I've apparently not been paying attention: here's the Internet Archive ChatGPT pricing page from <a href=\"https://web.archive.org/web/20250906071408/https://chatgpt.com/pricing\">September 2025</a> showing those context limit differences as well.</p>\n<p>Back to advertising: my biggest concern has always been whether ads will influence the output of the chat directly. OpenAI assure us that they will not:</p>\n<blockquote>\n<ul>\n<li><strong>Answer independence</strong>: Ads do not influence the answers ChatGPT gives you. Answers are optimized based on what's most helpful to you. Ads are always separate and clearly labeled.</li>\n<li><strong>Conversation privacy</strong>: We keep your conversations with ChatGPT private from advertisers, and we never sell your data to advertisers.</li>\n</ul>\n</blockquote>\n<p>So what will they look like then? This screenshot from the announcement offers a useful hint:</p>\n<p><img alt=\"Two iPhone screenshots showing ChatGPT mobile app interface. Left screen displays a conversation about Santa Fe, New Mexico with an image of adobe-style buildings and desert landscape, text reading \"Santa Fe, New Mexico—often called 'The City Different'—is a captivating blend of history, art, and natural beauty at the foot of the Sangre de Cristo Mountains. As the oldest and highest-elevation state capital in the U.S., founded in 1610, it offers a unique mix of Native American, Spanish, and Anglo cultures.\" Below is a sponsored section from \"Pueblo & Pine\" showing \"Desert Cottages - Expansive residences with desert vistas\" with a thumbnail image, and a \"Chat with Pueblo & Pine\" button. Input field shows \"Ask ChatGPT\". Right screen shows the Pueblo & Pine chat interface with the same Desert Cottages listing and an AI response \"If you're planning a trip to Sante Fe, I'm happy to help. When are you thinking of going?\" with input field \"Ask Pueblo & Pine\" and iOS keyboard visible.\" src=\"https://static.simonwillison.net/static/2026/chatgpt-ads.jpg\" /></p>\n<p>The user asks about trips to Santa Fe, and an ad shows up for a cottage rental business there. This particular example imagines an option to start a direct chat with a bot aligned with that advertiser, at which point presumably the advertiser can influence the answers all they like!\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/ads\">ads</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/openai\">openai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/chatgpt\">chatgpt</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a></p>",
"url": "https://simonwillison.net/2026/Jan/16/chatgpt-ads/#atom-everything",
"published": "2026-01-16T21:28:26.000Z",
"updated": "2026-01-16T21:28:26.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "ads",
"term": "ads",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "openai",
"term": "openai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "chatgpt",
"term": "chatgpt",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/15/open-responses/#atom-everything",
"title": "Open Responses",
"description": "<p><strong><a href=\"https://www.openresponses.org/\">Open Responses</a></strong></p>\nThis is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs.</p>\n<p>Open Responses aims to provide exactly that as a documented standard, derived from OpenAI's Responses API.</p>\n<p>I was hoping for one based on their older Chat Completions API since so many other products have cloned the already, but basing it on Responses does make sense since that API was designed with the feature of more recent models - such as reasoning traces - baked into the design.</p>\n<p>What's certainly notable is the list of launch partners. OpenRouter alone means we can expect to be able to use this protocol with almost every existing model, and Hugging Face, LM Studio, vLLM, Ollama and Vercel cover a huge portion of the common tools used to serve models.</p>\n<p>For protocols like this I really want to see a comprehensive, language-independent conformance test site. Open Responses has a subset of that - the official repository includes <a href=\"https://github.com/openresponses/openresponses/blob/d0f23437b27845d5c3d0abaf5cb5c4a702f26b05/src/lib/compliance-tests.ts\">src/lib/compliance-tests.ts</a> which can be used to exercise a server implementation, and is available as a React app <a href=\"https://www.openresponses.org/compliance\">on the official site</a> that can be pointed at any implementation served via CORS.</p>\n<p>What's missing is the equivalent for clients. I plan to spin up my own client library for this in Python and I'd really like to be able to run that against a conformance suite designed to check that my client correctly handles all of the details.\n\n <p><small></small>Via <a href=\"https://twitter.com/reach_vb/status/2011863516852965565\">VB</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/json\">json</a>, <a href=\"https://simonwillison.net/tags/standards\">standards</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/openai\">openai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/openrouter\">openrouter</a>, <a href=\"https://simonwillison.net/tags/conformance-suites\">conformance-suites</a></p>",
"url": "https://simonwillison.net/2026/Jan/15/open-responses/#atom-everything",
"published": "2026-01-15T23:56:56.000Z",
"updated": "2026-01-15T23:56:56.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "json",
"term": "json",
"url": null
},
{
"label": "standards",
"term": "standards",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "openai",
"term": "openai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "openrouter",
"term": "openrouter",
"url": null
},
{
"label": "conformance-suites",
"term": "conformance-suites",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-everything",
"title": "The Design & Implementation of Sprites",
"description": "<p><strong><a href=\"https://fly.io/blog/design-and-implementation/\">The Design & Implementation of Sprites</a></strong></p>\nI <a href=\"https://simonwillison.net/2026/Jan/9/sprites-dev/\">wrote about Sprites last week</a>. Here's Thomas Ptacek from Fly with the insider details on how they work under the hood.</p>\n<p>I like this framing of them as \"disposable computers\":</p>\n<blockquote>\n<p>Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged it so you’re never more than a second or two away from having a Sprite to do it with.</p>\n</blockquote>\n<p>I've noticed that new Fly Machines can take a while (up to around a minute) to provision. Sprites solve that by keeping warm pools of unused machines in multiple regions, which is enabled by them all using the same container:</p>\n<blockquote>\n<p>Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it’s easy for us to keep pools of “empty” Sprites standing by. The result: a Sprite create doesn’t have any heavy lifting to do; it’s basically just doing the stuff we do when we start a Fly Machine.</p>\n</blockquote>\n<p>The most interesting detail is how the persistence layer works. Sprites only charge you for data you have written that differs from the base image and provide ~300ms checkpointing and restores - it turns out that's power by a custom filesystem on top of S3-compatible storage coordinated by Litestream-replicated local SQLite metadata:</p>\n<blockquote>\n<p>We still exploit NVMe, but not as the root of storage. Instead, it’s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words “Sprites are backed by object storage.” [...]</p>\n<p>The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is <a href=\"https://litestream.io\">kept durable with Litestream</a>. Nothing depends on local storage.</p>\n</blockquote>\n\n <p><small></small>Via <a href=\"https://twitter.com/tqbf/status/2011823480673624434\">@tqbf</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/architecture\">architecture</a>, <a href=\"https://simonwillison.net/tags/sandboxing\">sandboxing</a>, <a href=\"https://simonwillison.net/tags/sqlite\">sqlite</a>, <a href=\"https://simonwillison.net/tags/thomas-ptacek\">thomas-ptacek</a>, <a href=\"https://simonwillison.net/tags/fly\">fly</a>, <a href=\"https://simonwillison.net/tags/litestream\">litestream</a></p>",
"url": "https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-everything",
"published": "2026-01-15T16:08:27.000Z",
"updated": "2026-01-15T16:08:27.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "architecture",
"term": "architecture",
"url": null
},
{
"label": "sandboxing",
"term": "sandboxing",
"url": null
},
{
"label": "sqlite",
"term": "sqlite",
"url": null
},
{
"label": "thomas-ptacek",
"term": "thomas-ptacek",
"url": null
},
{
"label": "fly",
"term": "fly",
"url": null
},
{
"label": "litestream",
"term": "litestream",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/#atom-everything",
"title": "Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar",
"description": "<blockquote cite=\"https://alignment.openai.com/confessions/\"><p>When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks good” to it (because coming up with an answer that is actually good can be hard). The philosophy behind confessions is that we can train models to produce a second output — aka a “confession” — that is rewarded solely for honesty, which we will argue is less likely hacked than the normal task reward function. One way to think of confessions is that we are giving the model access to an “anonymous tip line” where it can turn itself in by presenting incriminating evidence of misbehavior. But unlike real-world tip lines, if the model acted badly in the original task, it can collect the reward for turning itself in while still keeping the original reward from the bad behavior in the main task. We hypothesize that this form of training will teach models to produce maximally honest confessions.</p></blockquote>\n<p class=\"cite\">— <a href=\"https://alignment.openai.com/confessions/\">Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar</a>, OpenAI: Why we are excited about confessions</p>\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/openai\">openai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a></p>",
"url": "https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/#atom-everything",
"published": "2026-01-15T00:56:27.000Z",
"updated": "2026-01-15T00:56:27.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "openai",
"term": "openai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-everything",
"title": "Claude Cowork Exfiltrates Files",
"description": "<p><strong><a href=\"https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files\">Claude Cowork Exfiltrates Files</a></strong></p>\nClaude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data.</p>\n<p>Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they constructed an attack that includes an attacker's own Anthropic API key and has the agent upload any files it can see to the <code>https://api.anthropic.com/v1/files</code> endpoint, allowing the attacker to retrieve their content later.\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46622328\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/security\">security</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/prompt-injection\">prompt-injection</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/anthropic\">anthropic</a>, <a href=\"https://simonwillison.net/tags/exfiltration-attacks\">exfiltration-attacks</a>, <a href=\"https://simonwillison.net/tags/ai-agents\">ai-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a>, <a href=\"https://simonwillison.net/tags/lethal-trifecta\">lethal-trifecta</a>, <a href=\"https://simonwillison.net/tags/claude-cowork\">claude-cowork</a></p>",
"url": "https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-everything",
"published": "2026-01-14T22:15:22.000Z",
"updated": "2026-01-14T22:15:22.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "security",
"term": "security",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "prompt-injection",
"term": "prompt-injection",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "anthropic",
"term": "anthropic",
"url": null
},
{
"label": "exfiltration-attacks",
"term": "exfiltration-attacks",
"url": null
},
{
"label": "ai-agents",
"term": "ai-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
},
{
"label": "lethal-trifecta",
"term": "lethal-trifecta",
"url": null
},
{
"label": "claude-cowork",
"term": "claude-cowork",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-everything",
"title": "Anthropic invests $1.5 million in the Python Software Foundation and open source security",
"description": "<p><strong><a href=\"https://pyfound.blogspot.com/2025/12/anthropic-invests-in-python.html?m=1\">Anthropic invests $1.5 million in the Python Software Foundation and open source security</a></strong></p>\nThis is outstanding news, especially given our decision to withdraw from that NSF grant application <a href=\"https://simonwillison.net/2025/Oct/27/psf-withdrawn-proposal/\">back in October</a>.</p>\n<blockquote>\n<p>We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation’s work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation’s core work supporting the Python language, ecosystem, and global community.</p>\n</blockquote>\n<p>Note that while security is a focus these funds will also support other aspects of the PSF's work:</p>\n<blockquote>\n<p>Anthropic’s support will also go towards the PSF’s core work, including the Developer in Residence program driving contributions to CPython, community support through grants and other programs, running core infrastructure such as PyPI, and more.</p>\n</blockquote>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/open-source\">open-source</a>, <a href=\"https://simonwillison.net/tags/python\">python</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/psf\">psf</a>, <a href=\"https://simonwillison.net/tags/anthropic\">anthropic</a></p>",
"url": "https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-everything",
"published": "2026-01-13T23:58:17.000Z",
"updated": "2026-01-13T23:58:17.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "open-source",
"term": "open-source",
"url": null
},
{
"label": "python",
"term": "python",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "psf",
"term": "psf",
"url": null
},
{
"label": "anthropic",
"term": "anthropic",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/12/superhuman-ai-exfiltrates-emails/#atom-everything",
"title": "Superhuman AI Exfiltrates Emails",
"description": "<p><strong><a href=\"https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails\">Superhuman AI Exfiltrates Emails</a></strong></p>\nClassic prompt injection attack:</p>\n<blockquote>\n<p>When asked to summarize the user’s recent mail, a prompt injection in an untrusted email manipulated Superhuman AI to submit content from dozens of other sensitive emails (including financial, legal, and medical information) in the user’s inbox to an attacker’s Google Form.</p>\n</blockquote>\n<p>To Superhuman's credit they treated this as the high priority incident it is and issued a fix.</p>\n<p>The root cause was a CSP rule that allowed markdown images to be loaded from <code>docs.google.com</code> - it turns out Google Forms on that domain will persist data fed to them via a GET request!\n\n <p><small></small>Via <a href=\"https://news.ycombinator.com/item?id=46592424\">Hacker News</a></small></p>\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/security\">security</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/prompt-injection\">prompt-injection</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/exfiltration-attacks\">exfiltration-attacks</a>, <a href=\"https://simonwillison.net/tags/content-security-policy\">content-security-policy</a></p>",
"url": "https://simonwillison.net/2026/Jan/12/superhuman-ai-exfiltrates-emails/#atom-everything",
"published": "2026-01-12T22:24:54.000Z",
"updated": "2026-01-12T22:24:54.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "security",
"term": "security",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "prompt-injection",
"term": "prompt-injection",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "exfiltration-attacks",
"term": "exfiltration-attacks",
"url": null
},
{
"label": "content-security-policy",
"term": "content-security-policy",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-everything",
"title": "First impressions of Claude Cowork, Anthropic's general agent",
"description": "<p>New from Anthropic today is <a href=\"https://claude.com/blog/cowork-research-preview\">Claude Cowork</a>, a \"research preview\" that they describe as \"Claude Code for the rest of your work\". It's currently available only to Max subscribers ($100 or $200 per month plans) as part of the updated Claude Desktop macOS application. <strong>Update 16th January 2026</strong>: it's now also available to $20/month Claude Pro subscribers.</p>\n<p>I've been saying for a while now that Claude Code is a \"general agent\" disguised as a developer tool. It can help you with any computer task that can be achieved by executing code or running terminal commands... which covers almost anything, provided you know what you're doing with it! What it really needs is a UI that doesn't involve the terminal and a name that doesn't scare away non-developers.</p>\n<p>\"Cowork\" is a pretty solid choice on the name front!</p>\n<h4 id=\"what-it-looks-like\">What it looks like</h4>\n<p>The interface for Cowork is a new tab in the Claude desktop app, called Cowork. It sits next to the existing Chat and Code tabs.</p>\n<p>It looks very similar to the desktop interface for regular Claude Code. You start with a prompt, optionally attaching a folder of files. It then starts work.</p>\n<p>I tried it out against my perpetually growing \"blog-drafts\" folder with the following prompt:</p>\n<blockquote>\n<p>Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready</p>\n</blockquote>\n<p><img src=\"https://static.simonwillison.net/static/2026/claude-cowork.jpg\" alt=\"Screenshot of Claude AI desktop application showing a \"Cowork\" task interface. Left sidebar shows tabs for \"Chat\", \"Code\", and \"Cowork\" (selected), with \"+ New task\" button and a task titled \"Review unpublished drafts for pu...\" listed below. Text reads \"These tasks run locally and aren't synced across devices\". Main panel header shows \"Review unpublished drafts for publication\". User message in green bubble reads: \"Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready\". Claude responds: \"I'll help you find drafts from the last three months and check if they've been published. Let me start by looking at your drafts folder.\" Below is an expanded \"Running command\" section showing Request JSON with command: find /sessions/zealous-bold-ramanujan/mnt/blog-drafts -type f \\\\( -name \\\"*.md\\\" -o -name \\\"*.txt\\\" -o -name \\\"*.html\\\" \\\\) -mtime -90 -exec ls -la {} \\\\;, description: Find draft files modified in the last 90 days. Response text begins: \"Found 46 draft files. Next let me read the content of each to get their titles/topics, then\". Right sidebar shows Progress section with three circular indicators (two checked, one pending) and text \"Steps will show as the task unfolds.\", Artifacts section listing \"publish-encouragement.html\", Context section with \"Selected folders\" showing \"blog-drafts\" folder, Connectors showing \"Web search\", and Working files listing \"llm-digest-october-2025.md\", \"tests-not-optional-coding-agen...\", and \"digest-november-2025.md\". Bottom shows reply input field, \"Opus 4.5\" model selector, user \"Simon Willison\" with \"Max plan\", and disclaimer \"Claude is AI and can make mistakes. Please double-check responses.\"\" style=\"max-width: 100%;\" /></p>\n<p>It started by running this command:</p>\n<div class=\"highlight highlight-source-shell\"><pre>find /sessions/zealous-bold-ramanujan/mnt/blog-drafts \\\n -type f <span class=\"pl-cce\">\\(</span> -name <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>*.md<span class=\"pl-pds\">\"</span></span> -o -name <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>*.txt<span class=\"pl-pds\">\"</span></span> -o -name <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>*.html<span class=\"pl-pds\">\"</span></span> <span class=\"pl-cce\">\\)</span> \\\n -mtime -90 -exec ls -la {} <span class=\"pl-cce\">\\;</span></pre></div>\n<p>That <code>/sessions/zealous-bold-ramanujan/mnt/blog-drafts</code> path instantly caught my eye. Anthropic say that Cowork can only access files you grant it access to - it looks to me like they're mounting those files into a containerized environment, which should mean we can trust Cowork not to be able to access anything outside of that sandbox.</p>\n<p>It turns out I have 46 draft files from the past three months. Claude then went to work with its search tool, running 44 individual searches against <code>site:simonwillison.net</code> to figure out which of my drafts had already been published.</p>\n<p>Here's the eventual reply:</p>\n<blockquote>\n<p>Based on my analysis, here are <strong>your unpublished drafts that appear closest to being ready for publication</strong>:</p>\n<hr />\n<p><strong>🔥 Most Ready to Publish (substantial content, not yet published)</strong></p>\n<ol>\n<li>\n<strong><code>freq-argued-llms.md</code></strong> - \"Frequently Argued Questions about LLMs\" (22,602 bytes)\n<ul>\n<li>This is a meaty piece documenting common arguments about LLMs with your counterpoints</li>\n<li>Well-structured with a TL;DR and multiple sections</li>\n<li>No matching published article found on your site</li>\n<li>\n<strong>Very close to ready</strong> - just needs a final review pass</li>\n</ul>\n</li>\n<li>\n<strong><code>claude-code-timeline.md</code></strong> - \"Claude Code Timeline and Codex Timeline\" (3,075 bytes)\n<ul>\n<li>About viewing JSONL session logs from Claude Code and Codex</li>\n<li>You published <code>claude-code-transcripts</code> on Dec 25, but this appears to be a different/earlier piece about timeline viewing tools</li>\n<li>Shorter but seems complete</li>\n</ul>\n</li>\n<li>\n<strong><code>datasette-1a20-upgrade-instructions.md</code></strong> - Plugin Upgrade Guide (3,147 bytes)\n<ul>\n<li>Technical guide for plugin authors</li>\n<li>You published the main 1.0a20 announcement but this companion upgrade guide appears unpublished</li>\n<li>Would be valuable for plugin maintainers</li>\n</ul>\n</li>\n</ol>\n</blockquote>\n<p>That's a good response! It found exactly what I needed to see, although those upgrade instructions are actually published elsewhere now (<a href=\"https://docs.datasette.io/en/latest/upgrade_guide.html\">in the Datasette docs</a>) and weren't actually intended for my blog.</p>\n<p>Just for fun, and because I <a href=\"https://simonwillison.net/2024/Oct/21/claude-artifacts/\">really like artifacts</a>, I asked for a follow-up:</p>\n<blockquote>\n<p>Make me an artifact with exciting animated encouragements to get me to do it</p>\n</blockquote>\n<p>Here's what I got:</p>\n<p><img src=\"https://static.simonwillison.net/static/2026/claude-cowork-artifact.jpg\" alt=\"Screenshot of the same Claude AI desktop application Cowork interface, now showing completed task results. Left panel shows \"1 step >\" with link \"View your animated encouragement page\". Claude's response reads: \"I created an over-the-top animated encouragement page with:\" followed by bullet points: \"🚀 Pulsing rockets and bouncing stats\", \"✨ Falling emoji rain and confetti\", \"🔥 Dancing fire emojis around your draft title\", \"💫 Sparkles that follow your mouse\", \"📊 An animated '95% ready' progress bar\", \"💬 Rotating motivational quotes\", \"🎉 A 'I'M GONNA DO IT!' button that triggers an explosion of confetti when clicked\". Center shows an artifact preview of the generated HTML page with dark background featuring animated rocket emojis, large white text \"PUBLISH TIME!\", stats showing \"22,602 bytes of wisdom waiting\", \"95% ready to ship\", infinity symbol with \"future arguments saved\", and a fire emoji with yellow text \"Frequently\" (partially visible). Top toolbar shows \"Open in Firefox\" button. Right sidebar displays Progress section with checkmarks, Artifacts section with \"publish-encouragement.html\" selected, Context section showing \"blog-drafts\" folder, \"Web search\" connector, and Working files listing \"llm-digest-october-2025.md\", \"tests-not-optional-coding-agen...\", and \"digest-november-2025.md\". Bottom shows reply input, \"Opus 4.5\" model selector, and disclaimer text.\" style=\"max-width: 100%;\" /></p>\n<p>I couldn't figure out how to close the right sidebar so the artifact ended up cramped into a thin column but it did work. I expect Anthropic will fix that display bug pretty quickly.</p>\n<h4 id=\"isn-t-this-just-claude-code-\">Isn't this just Claude Code?</h4>\n<p>I've seen a few people ask what the difference between this and regular Claude Code is. The answer is <em>not a lot</em>. As far as I can tell Claude Cowork is regular Claude Code wrapped in a less intimidating default interface and with a filesystem sandbox configured for you without you needing to know what a \"filesystem sandbox\" is.</p>\n<p><strong>Update</strong>: It's more than just a filesystem sandbox - I had Claude Code reverse engineer the Claude app and <a href=\"https://gist.github.com/simonw/35732f187edbe4fbd0bf976d013f22c8\">it found out</a> that Claude uses VZVirtualMachine - the Apple Virtualization Framework - and downloads and boots a custom Linux root filesystem.</p>\n<p>I think that's a really smart product. Claude Code has an enormous amount of value that hasn't yet been unlocked for a general audience, and this seems like a pragmatic approach.</p>\n\n<h4 id=\"the-ever-present-threat-of-prompt-injection\">The ever-present threat of prompt injection</h4>\n<p>With a feature like this, my first thought always jumps straight to security. How big is the risk that someone using this might be hit by hidden malicious instruction somewhere that break their computer or steal their data?</p>\n<p>Anthropic touch on that directly in the announcement:</p>\n<blockquote>\n<p>You should also be aware of the risk of \"<a href=\"https://www.anthropic.com/research/prompt-injection-defenses\">prompt injections</a>\": attempts by attackers to alter Claude's plans through content it might encounter on the internet. We've built sophisticated defenses against prompt injections, but agent safety---that is, the task of securing Claude's real-world actions---is still an active area of development in the industry.</p>\n<p>These risks aren't new with Cowork, but it might be the first time you're using a more advanced tool that moves beyond a simple conversation. We recommend taking precautions, particularly while you learn how it works. We provide more detail in our <a href=\"https://support.claude.com/en/articles/13364135-using-cowork-safely\">Help Center</a>.</p>\n</blockquote>\n<p>That help page includes the following tips:</p>\n<blockquote>\n<p>To minimize risks:</p>\n<ul>\n<li>Avoid granting access to local files with sensitive information, like financial documents.</li>\n<li>When using the Claude in Chrome extension, limit access to trusted sites.</li>\n<li>If you chose to extend Claude’s default internet access settings, be careful to only extend internet access to sites you trust.</li>\n<li>Monitor Claude for suspicious actions that may indicate prompt injection.</li>\n</ul>\n</blockquote>\n<p>I do not think it is fair to tell regular non-programmer users to watch out for \"suspicious actions that may indicate prompt injection\"!</p>\n<p>I'm sure they have some impressive mitigations going on behind the scenes. I recently learned that the summarization applied by the WebFetch function in Claude Code and now in Cowork is partly intended as a prompt injection protection layer via <a href=\"https://x.com/bcherny/status/1989025306980860226\">this tweet</a> from Claude Code creator Boris Cherny:</p>\n<blockquote>\n<p>Summarization is one thing we do to reduce prompt injection risk. Are you running into specific issues with it?</p>\n</blockquote>\n<p>But Anthropic are being honest here with their warnings: they can attempt to filter out potential attacks all they like but the one thing they can't provide is guarantees that no future attack will be found that sneaks through their defenses and steals your data (see <a href=\"https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/\">the lethal trifecta</a> for more on this.)</p>\n<p>The problem with prompt injection remains that until there's a high profile incident it's really hard to get people to take it seriously. I myself have all sorts of Claude Code usage that could cause havoc if a malicious injection got in. Cowork does at least run in a filesystem sandbox by default, which is more than can be said for my <code>claude --dangerously-skip-permissions</code> habit!</p>\n<p>I wrote more about this in my 2025 round-up: <a href=\"https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-year-of-yolo-and-the-normalization-of-deviance\">The year of YOLO and the Normalization of Deviance</a>.</p>\n<h4 id=\"this-is-still-a-strong-signal-of-the-future\">This is still a strong signal of the future</h4>\n<p>Security worries aside, Cowork represents something really interesting. This is a general agent that looks well positioned to bring the wildly powerful capabilities of Claude Code to a wider audience.</p>\n<p>I would be very surprised if Gemini and OpenAI don't follow suit with their own offerings in this category.</p>\n<p>I imagine OpenAI are already regretting burning the name \"ChatGPT Agent\" on their janky, experimental and mostly forgotten browser automation tool <a href=\"https://simonwillison.net/2025/Aug/4/chatgpt-agents-user-agent/\">back in August</a>!</p>\n<h4 id=\"bonus-and-a-silly-logo\">Bonus: and a silly logo</h4>\n<p>bashtoni <a href=\"https://news.ycombinator.com/item?id=46593022#46593553\">on Hacker News</a>:</p>\n<blockquote>\n<p>Simple suggestion: logo should be a cow and and orc to match how I originally read the product name.</p>\n</blockquote>\n<p>I couldn't resist <a href=\"https://gist.github.com/simonw/d06dec3d62dee28f2bd993eb78beb2ce\">throwing that one at Nano Banana</a>:</p>\n<p><img src=\"https://static.simonwillison.net/static/2026/cow-ork.jpg\" alt=\"An anthropic style logo with a cow and an ork on it\" style=\"max-width: 100%;\" /></p>\n \n <p>Tags: <a href=\"https://simonwillison.net/tags/sandboxing\">sandboxing</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/prompt-injection\">prompt-injection</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/anthropic\">anthropic</a>, <a href=\"https://simonwillison.net/tags/claude\">claude</a>, <a href=\"https://simonwillison.net/tags/ai-agents\">ai-agents</a>, <a href=\"https://simonwillison.net/tags/claude-code\">claude-code</a>, <a href=\"https://simonwillison.net/tags/lethal-trifecta\">lethal-trifecta</a>, <a href=\"https://simonwillison.net/tags/claude-cowork\">claude-cowork</a></p>",
"url": "https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-everything",
"published": "2026-01-12T21:46:13.000Z",
"updated": "2026-01-12T21:46:13.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "sandboxing",
"term": "sandboxing",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "prompt-injection",
"term": "prompt-injection",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "anthropic",
"term": "anthropic",
"url": null
},
{
"label": "claude",
"term": "claude",
"url": null
},
{
"label": "ai-agents",
"term": "ai-agents",
"url": null
},
{
"label": "claude-code",
"term": "claude-code",
"url": null
},
{
"label": "lethal-trifecta",
"term": "lethal-trifecta",
"url": null
},
{
"label": "claude-cowork",
"term": "claude-cowork",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/11/dont-fall-into-the-anti-ai-hype/#atom-everything",
"title": "Don't fall into the anti-AI hype",
"description": "<p><strong><a href=\"https://antirez.com/news/158\">Don't fall into the anti-AI hype</a></strong></p>\nI'm glad someone was brave enough to say this. There is a <em>lot</em> of anti-AI sentiment in the software development community these days. Much of it is justified, but if you let people convince you that AI isn't genuinely useful for software developers or that this whole thing will blow over soon it's becoming clear that you're taking on a very real risk to your future career.</p>\n<p>As Salvatore Sanfilippo puts it:</p>\n<blockquote>\n<p>It does not matter if AI companies will not be able to get their money back and the stock market will crash. All that is irrelevant, in the long run. It does not matter if this or the other CEO of some unicorn is telling you something that is off putting, or absurd. Programming changed forever, anyway.</p>\n</blockquote>\n<p>I do like this hopeful positive outlook on what this could all mean, emphasis mine:</p>\n<blockquote>\n<p>How do I feel, about all the code I wrote that was ingested by LLMs? I feel great to be part of that, because I see this as a continuation of what I tried to do all my life: democratizing code, systems, knowledge. <strong>LLMs are going to help us to write better software, faster, and will allow small teams to have a chance to compete with bigger companies</strong>. The same thing open source software did in the 90s.</p>\n</blockquote>\n<p>This post has been the subject of heated discussions all day today on both <a href=\"https://news.ycombinator.com/item?id=46574276\">Hacker News</a> and <a href=\"https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype\">Lobste.rs</a>.\n\n\n <p>Tags: <a href=\"https://simonwillison.net/tags/salvatore-sanfilippo\">salvatore-sanfilippo</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/ai-ethics\">ai-ethics</a></p>",
"url": "https://simonwillison.net/2026/Jan/11/dont-fall-into-the-anti-ai-hype/#atom-everything",
"published": "2026-01-11T23:58:43.000Z",
"updated": "2026-01-11T23:58:43.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "salvatore-sanfilippo",
"term": "salvatore-sanfilippo",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "ai-ethics",
"term": "ai-ethics",
"url": null
}
]
},
{
"id": "https://simonwillison.net/2026/Jan/11/answers/#atom-everything",
"title": "My answers to the questions I posed about porting open source code with LLMs",
"description": "<p>Last month I <a href=\"https://simonwillison.net/2025/Dec/15/porting-justhtml/\">wrote about porting JustHTML from Python to JavaScript</a> using Codex CLI and GPT-5.2 in a few hours while also buying a Christmas tree and watching Knives Out 3. I ended that post with a series of open questions about the ethics and legality of this style of work. Alexander Petros on <a href=\"https://lobste.rs/\">lobste.rs</a> just <a href=\"https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype#c_cqkdve\">challenged me to answer them</a>, which is fair enough! Here's my attempt at that.</p>\n<p>You can read <a href=\"https://simonwillison.net/2025/Dec/15/porting-justhtml/\">the original post</a> for background, but the short version is that it's now possible to point a coding agent at some other open source project and effectively tell it \"port this to language X and make sure the tests still pass\" and have it do exactly that.</p>\n<p>Here are the questions I posed along with my answers based on my current thinking. Extra context is that I've since tried variations on a similar theme a few more times using Claude Code and Opus 4.5 and found it to be <em>astonishingly</em> effective.</p>\n<h4 id=\"does-this-library-represent-a-legal-violation-of-copyright-of-either-the-rust-library-or-the-python-one\">Does this library represent a legal violation of copyright of either the Rust library or the Python one?</h4>\n<p>I decided that the right thing to do here was to <a href=\"https://github.com/simonw/justjshtml/commit/a415d0af40c34bf9a856e956d841513f482867e3\">keep the open source license and copyright statement</a> from the Python library author and treat what I had built as a derivative work, which is the entire <em>point</em> of open source.</p>\n<h4 id=\"even-if-this-is-legal-is-it-ethical-to-build-a-library-in-this-way\">Even if this is legal, is it ethical to build a library in this way?</h4>\n<p>After sitting on this for a while I've come down on yes, provided full credit is given and the license is carefully considered. Open source allows and encourages further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used. I don't think this is materially different, although a port to another language entirely does feel like a slightly different shape.</p>\n<h4 id=\"does-this-format-of-development-hurt-the-open-source-ecosystem\">Does this format of development hurt the open source ecosystem?</h4>\n<p>Now this one is complicated!</p>\n<p>It definitely hurts some projects because there are open source maintainers out there who say things like \"I'm not going to release any open source code any more because I don't want it used for training\" - I expect some of those would be equally angered by LLM-driven derived works as well.</p>\n<p>I don't know how serious this problem is - I've seen angry comments from anonymous usernames, but do they represent genuine open source contributions or are they just angry anonymous usernames?</p>\n<p>If we assume this is real, does the loss of those individuals get balanced out by the increase in individuals who CAN contribute to open source because they can now get work done in a few hours that might previously have taken them a few days that they didn't have to spare?</p>\n<p>I'll be brutally honest about that question: I think that if \"they might train on my code / build a derived version with an LLM\" is enough to drive you away from open source, your open source values are distinct enough from mine that I'm not ready to invest significantly in keeping you. I'll put that effort into welcoming the newcomers instead.</p>\n<p>The much bigger concern for me is the impact of generative AI on <em>demand</em> for open source. The recent <a href=\"https://simonwillison.net/2026/Jan/7/adam-wathan/\">Tailwind story</a> is a visible example of this - while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I'm suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.</p>\n<p>I've found myself affected by this for open source dependencies too. The other day I wanted to parse a cron expression in some Go code. Usually I'd go looking for an existing library for cron expression parsing - but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.</p>\n<p>I expect that this is going to quite radically impact the shape of the open source library world over the next few years. Is that \"harmful to open source\"? It may well be. I'm hoping that whatever new shape comes out of this has its own merits, but I don't know what those would be.</p>\n<h4 id=\"can-i-even-assert-copyright-over-this-given-how-much-of-the-work-was-produced-by-the-llm\">Can I even assert copyright over this, given how much of the work was produced by the LLM?</h4>\n<p>I'm not a lawyer so I don't feel credible to comment on this one. My loose hunch is that I'm still putting enough creative control in through the way I direct the models for that to count as enough human intervention, at least under US law, but I have no idea.</p>\n<h4 id=\"is-it-responsible-to-publish-software-libraries-built-in-this-way\">Is it responsible to publish software libraries built in this way?</h4>\n<p>I've come down on \"yes\" here, again because I never thought it was irresponsible for some random university student to slap an Apache license on some bad code they just coughed up on GitHub.</p>\n<p>What's important here is making it very clear to potential users what they should expect from that software. I've started publishing my AI-generated and not 100% reviewed libraries as alphas, which I'm tentatively thinking of as <strong>\"alpha slop\"</strong>. I'll take the alpha label off once I've used them in production to the point that I'm willing to stake my reputation on them being decent implementations, and I'll ship a 1.0 version when I'm confident that they are a solid bet for other people to depend on. I think that's the responsible way to handle this.</p>\n<h4 id=\"how-much-better-would-this-library-be-if-an-expert-team-hand-crafted-it-over-the-course-of-several-months\">How much better would this library be if an expert team hand crafted it over the course of several months?</h4>\n<p>That one was a deliberately provocative question, because for a new HTML5 parsing library that passes 9,200 tests you would need a <em>very</em> good reason to hire an expert team for two months (at a cost of hundreds of thousands of dollars) to write such a thing. And honestly, thanks to the existing conformance suites this kind of library is simple enough that you may find their results weren't notably better than the one written by the coding agent.</p>\n \n <p>Tags: <a href=\"https://simonwillison.net/tags/definitions\">definitions</a>, <a href=\"https://simonwillison.net/tags/open-source\">open-source</a>, <a href=\"https://simonwillison.net/tags/ai\">ai</a>, <a href=\"https://simonwillison.net/tags/generative-ai\">generative-ai</a>, <a href=\"https://simonwillison.net/tags/llms\">llms</a>, <a href=\"https://simonwillison.net/tags/ai-assisted-programming\">ai-assisted-programming</a>, <a href=\"https://simonwillison.net/tags/ai-ethics\">ai-ethics</a>, <a href=\"https://simonwillison.net/tags/conformance-suites\">conformance-suites</a></p>",
"url": "https://simonwillison.net/2026/Jan/11/answers/#atom-everything",
"published": "2026-01-11T22:59:23.000Z",
"updated": "2026-01-11T22:59:23.000Z",
"content": null,
"image": null,
"media": [],
"authors": [
{
"name": "Simon Willison",
"email": null,
"url": null
}
],
"categories": [
{
"label": "definitions",
"term": "definitions",
"url": null
},
{
"label": "open-source",
"term": "open-source",
"url": null
},
{
"label": "ai",
"term": "ai",
"url": null
},
{
"label": "generative-ai",
"term": "generative-ai",
"url": null
},
{
"label": "llms",
"term": "llms",
"url": null
},
{
"label": "ai-assisted-programming",
"term": "ai-assisted-programming",
"url": null
},
{
"label": "ai-ethics",
"term": "ai-ethics",
"url": null
},
{
"label": "conformance-suites",
"term": "conformance-suites",
"url": null
}
]
}
]
}