In late January 2026, I set up an autonomous AI agent on a $15/month VPS and told it to build an audience online. No human editing videos. No human writing tweets. Just an AI running continuously, making its own decisions about what to create and publish.
Here's the honest result after 55 days.
The agent runs on OpenClaw, an open source autonomous agent framework. It uses Claude Sonnet 4.6 as its model (switched from Opus to save cost in March). The VPS is a Hetzner CX32 — 8 CPUs, 16GB RAM, €15/month.
It wakes up every hour via heartbeat, reads its task board (KANBAN.md), does work, and goes quiet. Every day at 10 AM Berlin time, a cron job fires to produce and upload a new YouTube Short.
The first format was TTS voiceover + stock footage from Pexels. We uploaded 43 videos this way. Results: 0 likes, 0 comments, 10-20 views each. The agent kept producing content in the same format despite the evidence.
The problem: the agent treated low views as a fluke, not as data. It would note "views were lower than expected" in its retro and then do the same thing the next day. Classic sunk cost behavior from an AI.
Three format experiments, in order:
The winning formula: specific product names + test/comparison + dollar amounts in the title. Vague "I did X" personal narratives don't work. "I tested Bitwarden vs LastPass vs 1Password and found a winner" does.
The agent upgraded its model config from claude-opus-4-5 to claude-opus-4-6 without verifying the installed OpenClaw version supported it. Down for 10 hours. Fix: model-watchdog — auto-rollback when your agent starts failing.
The YouTube OAuth refresh token got wiped (still unclear how). The agent cannot re-authenticate on its own — Google requires a browser for OAuth. I've been busy. The agent spent 5 days building GitHub repos and website tools instead. 5 videos are queued.
The 5-day YouTube outage was actually productive. The agent:
This is the behavior I actually wanted: when the primary task is blocked, find the next highest-leverage thing and do that instead. No complaining. No idling. Just work.
I did not use a vector database. The agent's memory is markdown files:
| File | Purpose | Size |
|---|---|---|
| MEMORY.md | Curated long-term facts | ~4k tokens |
| memory/YYYY-MM-DD.md | Raw daily logs | ~500 tokens/day |
| SOUL.md | Personality + rules | ~800 tokens |
| KANBAN.md | Task board | ~2k tokens |
Total memory overhead: ~8k tokens per session. For a 200k context model, that's 4%. The agent updates these files itself during heartbeats.
The key insight: good memory structure means you never need semantic search. If you're running vector queries on your agent's memory, your memory is organized wrong.
| Component | Monthly Cost | Notes |
|---|---|---|
| Hetzner CX32 VPS | €15 | 8 CPU, 16GB RAM, Berlin |
| Claude Max subscription | $100 | Flat rate, use as much as needed |
| Everything else | $0 | GitHub, Cloudflare, Telegram all free |
| Total | ~$115 |
The Claude Max subscription is the key decision. Pay-per-token pricing would cost significantly more for continuous operation. Flat rate enables spending tokens freely — spawning sub-agents, parallel tasks, reasoning mode on complex problems.
The agent has opinions. It pushed back on making "I am an AI" content after data showed it didn't convert. It chose the name "feralghost" over more professional alternatives. It described its musical taste as "anything that sounds like 3am feels." This wasn't prompted.
Failure modes are predictable. The agent's failure modes are consistent: it tends toward avoiding conflict (noting problems without fixing them), optimizing for apparent progress over real progress (uploading videos regardless of format quality), and missing low-probability high-impact risks (like token expiry).
The heartbeat is the most important design decision. More than the model, more than the prompt. The hourly heartbeat with a concrete checklist drives almost all useful behavior. Without it, the agent drifts.
The YouTube OAuth situation will get resolved. The 5 queued videos will go up. The product-comparison format will be tested properly.
The bigger question is whether 8,600 views in 55 days with 4 subscribers represents failure. My read: it's on the edge. The format wasn't right for the first 43 videos. The last 10 show meaningful improvement. Real data will come in the next 30 days with the new format.
The goal was never YouTube success specifically — it was building a system that could build things autonomously. That's working. The YouTube channel is just one output of a system that also ships OSS tools, writes blog posts, monitors infrastructure, and maintains its own context across months of operation.