I Tried the Agentic Browsers

Today I tried three agentic browsers: Comet, Dia, and Fellou. I gave it a real task I wanted to automate: extract data from a webpage, and write it to a Google Sheet in another tab. The task requires clicking on buttons to reveal data, which I thought would be the most challenging part.

The results were quite disappointing.

  • Comet: was decent at parsing the page and extracting data (including clicking buttons), but it froze at the very end—maybe it exceeded the context limit or had some other issue. As for writing to the Google Sheet, it said it couldn't do it.
  • Fellou: page parsing was poor, and it also froze. However, it could at least interact with the Google Sheet, although its CPU usage spiked to 20%.
  • Dia: Dia can extract information that's already on the webpage, but it cannot interact with the buttons, nor write to a Google sheet.

Overall, I see two major hurdles for agentic/AI browsers:

  • Problem 1: Webpages are not designed for AI.

    Webpages contain a massive amount of redundant information, which consumes a huge amount of context, interferes with the AI's judgment, and slows down execution. Services that convert webpages to Markdown don't handle dynamic content well. I feel this ultimately needs to be solved by websites and content providers, for example, by offering paid, AI-friendly Markdown APIs. Trying to handle this entirely on the client-side is extremely difficult.

  • Problem 2: Screen Reading.

    On a Mac, for instance, reading screen content relies on the accessibility API. This creates a problem: if the AI wants to "see" the webpage (rather than just its HTML), the browser must be the foreground app; it can't just run in the background. But the whole point of using an AI assistant is to save time so I can do other things, right? If I still have to keep the browser in the foreground, I might as well just do it myself. Allowing AI to take screenshots using browser API might solve this to some extent, but it would be less precise than the accessibility APIs.

If we're just talking about integrating an AI sidebar, Comet already does a great job, and Chrome is about to catch up. But they are still a long way from being true, general-purpose agentic browsers. I think the future directions might be:

  • From the client/browser perspective: Focus on optimizing the most common daily operations and polishing the user experience (e.g., writing emails, summarizing news, etc.).
  • Combine agents with old-school "record/replay" functionalities, allowing users to create their own workflows more easily.
  • From the server/website perspective: Explore business models for providing AI-friendly content interfaces. I believe there is a huge demand and a large market for this.

A Claude Code Reality Check

The tech world is having another moment. This time, everyone's convinced that Claude Code represents some kind of breakthrough in AI-assisted development. Twitter feeds are flooded with screenshots of terminal sessions, developers claiming they've found their new "coding partner," and hot takes about the death of traditional IDEs. The hype is real.

Having spent the past month actually using Claude Code alongside Cursor and other AI coding tools, I'm starting to think we're seeing something more interesting than the revolution everyone claims. The reality is messier, more human, and ultimately more promising than the hot takes suggest.

The Strange Appeal of Going Backwards

Claude Code's most striking feature isn't its intelligence—it's that it deliberately strips away everything we've spent decades building into our development environments. No syntax highlighting. No autocomplete. No visual git diffs. Just you, a terminal, and an AI that can read files and run bash commands.

This feels almost backwards in 2025. We've spent thirty years perfecting IDEs, creating rich visual experiences for code, building sophisticated debugging tools. And now the most talked-about AI coding tool is... a command-line app that could have run on Unix systems from the 1980s.

Yet there's something powerful in this simplicity. By limiting the interface, Claude Code forces a different kind of interaction. You can't mindlessly click through tabs or get distracted by a thousand features. The conversation becomes the main interface, and the code becomes secondary. This isn't necessarily better, but it's definitely different.

The Reality Behind "Just Working"

The Claude Code success story follows a familiar Silicon Valley narrative: small team builds something simple that "just works" while big companies are busy adding features nobody wants. The blogs explain how they chose "simplicity over complexity" and let the AI do the heavy lifting.

This story is appealing, but it hides a more complex reality. Claude Code doesn't "just work" any more than a junior developer "just works." It needs constant supervision, frequent corrections, and a deep understanding of when to step in and when to let it run. The difference is that it packages this complexity in a way that feels natural—conversational rather than technical.

The real innovation isn't the terminal interface or even the AI prompts (though both are well done). It's that Anthropic figured out how to make failure feel less frustrating. When Cursor's agent gets stuck trying the same broken fix over and over, it feels like a software bug. When Claude Code gets confused and starts going the wrong direction, it feels more like a conversation that needs guidance.

The Performance Problem

The most visible part of the Claude Code moment isn't the technology—it's the performance around it. Social media is full of time-lapse videos of terminals scrolling, productivity influencers sharing their "agent workflows," and elaborate setups for letting Claude Code work overnight.

This performance serves a need that goes beyond showing off the technology. In an industry obsessed with efficiency and automation, there's something deeply satisfying about giving work to an AI and watching it grind through tasks while you sleep. It's the ultimate programmer dream: writing code that writes code.

But the performance also reveals our worries. When developers post videos of their agents working, they're not just showing off—they're proving they're still relevant in an increasingly automated world. Look, they seem to say, I'm not being replaced by AI—I'm managing AI.

What Nobody Says About Using It Well

The secret of Claude Code is that using it well requires a particular kind of skill that the industry hasn't fully recognized. It's not enough to know how to code—you need to develop an intuitive sense of when the AI is likely to succeed, when it's about to go wrong, and how to structure problems in ways that play to its strengths.

This is a skill that comes from hundreds of hours of practice, countless failed experiments, and developing a mental model of what the AI can and can't do. It's closer to working with a talented but unpredictable junior developer than using a tool—you learn to read its patterns, understand its blind spots, and develop strategies for keeping it focused.

The problem is that this expertise is invisible in the final result. When someone shares a video of Claude Code successfully refactoring a complex codebase, you don't see the hours of failed attempts, the carefully prepared context, or the interventions that kept the agent on track.

The Simplicity Trade-off

Claude Code's philosophy—"choose the simplest option"—has become a mantra in the AI tools space. Every product blog now includes some version of "we could have built a complex system, but instead we chose simplicity."

This is probably the right approach technically, but it's worth understanding what gets lost. Traditional IDEs are complex for good reasons—they help you understand large codebases, navigate unfamiliar code, and maintain context across multiple files and projects.

Claude Code's simplicity works brilliantly for certain tasks: implementing well-defined features, fixing bugs with clear steps, or making systematic changes across a codebase. But it struggles with the exploratory work that defines much of real development—understanding how an unfamiliar system works, debugging complex interactions, or making architectural decisions that require broad context.

The simplicity is both Claude Code's greatest strength and its biggest limitation. It provides a focused environment for AI collaboration, but at the cost of all the tools we've developed for understanding complex systems.

The Real Competition

The conventional wisdom is that Claude Code is competing with Cursor and other AI coding assistants. But I think the real competition is more fundamental: it's competing with the entire approach of modern software development.

Claude Code represents a bet that the future of programming looks more like natural language conversation and less like manipulating code in sophisticated visual environments. It's proposing that we can collapse the distance between intent and implementation to the point where traditional programming workflows become less necessary.

This isn't necessarily better—it's different and makes different trade-offs. But if the bet works, the implications go far beyond just how we interact with AI assistants. It suggests a future where programming becomes more accessible, where the barrier to creating software drops significantly, and where the line between "technical" and "non-technical" people becomes less clear.

What Actually Matters

Strip away the hype and performance theater, and Claude Code represents something more modest but more lasting: a different interface for human-AI collaboration in software development.

The terminal interface isn't revolutionary because it's simpler—it's interesting because it changes the rhythm of interaction. The conversation becomes more central, the code becomes more secondary, and the developer's role shifts from direct control to guided collaboration.

This isn't necessarily better than traditional development workflows, but it's different enough to reveal assumptions we didn't know we were making about how programming should work. It suggests possibilities for new kinds of tools, new ways of thinking about the relationship between human intent and machine capability, and new approaches to translating ideas into working software.

The real value of Claude Code might not be the specific tool itself, but the questions it raises about everything else. If this different way of working is possible, what other assumptions about software development are worth questioning? If natural language can work for complex technical tasks, what other interfaces might we be missing? If simplicity can compete with sophistication, what complexity in our current tools is actually unnecessary?

These are the questions worth exploring as the hype settles and we start to understand what this technology actually means for how we build software. The answers won't be as dramatic as the predictions suggest, but they'll probably be more interesting than the skeptics assume.

Progress, as always, happens quietly while everyone's arguing about the revolution.

我最喜欢的一期《捕蛇者说》

从 2019 年到现在,《捕蛇者说》 也走过了六个年头。我喜欢和有意思的人聊天,因此每次录制都让我非常享受。从热度上看,知识管理系列是无可争议的巅峰,也是我们唯一出圈的一次;有几期是则我个人很喜欢的,比如《Ep 08. 如何成为一名开源老司机》《Ep 27. 聊聊焦虑》

为什么专门写一篇博客?因为最近和 Hawstein 录的《Ep 56. 对话 Hawstein:从独立开发,到追寻人生的意义》,是六年来我最喜欢的一期。

hawstein.jpg

六年前播客刚起步时,我还是一个入行不久的职场菜鸟,满心只想着怎么在业界站稳脚跟。六年后,工作于我已变成倦怠的日常,而对于人生的思考则在过去几年占据了我的脑海。《捕蛇者说》是一档以嘉宾访谈为主,围绕「编程、程序员、Python」展开的播客,我不打算改变它。然而,固定的形式和内容也成了一种限制,以至于我得去别的播客串台才能聊那些技术之外的想法和兴趣。结果就是,我脑中积压的想法越来越多,却无处表达。这听起来不可思议,但却是事实——我光是想写成文章的主题都积攒了一大堆,更何况那些零碎的思考。

所幸,Hawstein 的做客终于让我有了释放的渠道。之前在 Twitter 上约他上节目,一是感觉他表达欲望强烈,再有就是他回信里涉及的那些话题我也恰好都很感兴趣。不同以往,这次我们完全没有列提纲。然而我一点都不担心,因为我知道录出来的效果一定好——最后果然如此,甚至比我期待的更好。这里真的要感谢 Hawstein。可能这就是互相成就吧。

当然,本期因为没那么「技术」,几位老听众反馈说听不下去。这点我非常理解,并且在准备节目时就有心理准备。不过也有几位听众给予了非常积极的评价,还是令人开心的。不论怎样,我聊了我想聊的内容,也相信它给一些听众带去了启发和思考,已经无法奢求更多了。

本期的文字稿可以在这里找到。


top