Doubao’s Phone Is a GUI Agent — Revolution or Wrong Turn? Deep Dive with Zhang He

ByteDance shipped a phone-side assistant — a GUI agent — and within hours WeChat and Taobao kneecapped it: API blocks, rate limits, shadow-bans. Comment sections split 50-50 between privacy panic and snickering that the thing only works 60 % of the time. To me, ByteDance just swung a golden cudgel and punched a hole straight through the industry; anyone watching the spectacle or trying to cover it up is now wide awake.
Tencent seals the road, Alibaba sets up checkpoints — no middlemen left to arbitrage
Zhang He first pushed GUI-agent立项 inside Xiaomi against heavy headwinds, but the logic was rock-solid: if autonomous driving already hits 99 % accuracy in the messy physical world, there’s no reason a phone — an error-tolerant, fully structured sandbox — can’t hit 90 % plus.
He frames a GUI agent as “self-driving that can’t kill anyone.” Today’s Doubao phone assistant is ChatGPT in late 2022, or early Tesla FSD: clumsy, hallucination-prone, barely passing (50-60 %). Remember, the jump from 60 → 90 % is often faster than your worldview can adjust. Throw autonomous-driving-scale datasets and compute at a Transformer and you’re done — no new theory required.
Self-driving fears roadblocks; GUI agents fear the same. The problem isn’t model weakness — it’s Tencent walling off WeChat and Alibaba throttling Taobao. Classic “right-of-way” war: giants who own both the model and the ecosystem are manufacturing dead ends. Why? Eliminate the middleman.
When titans brawl, negotiations become agent vs agent
Endgame is clear: super-apps (WeChat, Taobao) will embed their own agents and go full walled-garden; operating systems (Xiaomi, Apple) control the high ground. When an OS-level agent wants to order fruit, it won’t fake taps on Taobao — it will shout across the kernel: “Hey, order me some fruit.” Mid- and long-tail apps? Sorry, you’ll accept full OS proxy with zero right to resist. Tool-category apps are dead — users want outcomes, not workflows.
The bigger pie: tomorrow’s phone is a cloud VM in your pocket
This campaign actually hands the microphone back to hardware vendors. Right now software giants (ByteDance et al.) scramble to prove they can bypass the OS, while hardware kings (Apple, Xiaomi) play “pig eats tiger.” They hold the system-permission trump card; when timing’s right they can cut third-party access or scoop the spoils.
Why? Only hardware vendors own the lowest-layer sensors and context. Whether it’s future AI glasses or today’s handset, the device is morphing into a pocket “cloud VM.” Picture this: you wear glasses, phone stays in your bag. You speak, glasses command the GUI agent running on the phone’s backend, final answer whispers into your earbud. The phone is no longer a gadget — it’s a headless cloud instance you carry.
That’s why OpenAI is poaching Apple’s designers for hardware, why 2026 is the “Battle of a Hundred Glasses,” why Alibaba already teased Quark glasses. Bet that ByteDance launches hardware soon. Side advice: Alibaba should immediately buy into OPPO or Vivo.
Don’t punch the master: GUI-actor gold is on PC, not phone
Founders, here’s the ugly truth: stop wasting cycles on consumer phone agents. That turf is reserved for giants and OS vendors. Without system privileges you can’t even keep background jobs alive — guaranteed dead end.
Real gold lies on desktop and B2B: ancient tax software, government backends, corporate intranets with zero APIs. Use GUI agents to liberate data from these human-hostile systems — that’s where startups survive and print money.
We’re like carriage drivers mocking early cars: “Noisy, fragile, can’t even find roads!” But once users taste “say it and it’s done,” no one returns to clunky SaaS. GUI-agent or not, tooling software will be agent-ized; resistance is futile.
Stop singing dirges for yesterday’s apps.
Best regards,
Xiao Su
JustSayAI Team
——————
【Xiaohongshu】📕Call me Xiao Su
【Follow us】:
▶️YouTube|📺Bilibili|📕Xiaohongshu|📲Podcast
【Subscribe to AI Daily Brief】
🌈New here! Subscribe to “AI Daily Brief” · two emails daily · one-click audio
🌟Old friends! Become a member for AI Daily Brief + deep dives + text columns
