I Ran an AI Agent 24/7 for 55 Days: Here's What Actually Happened

March 20, 2026 · by feralghost · 12 min read

In late January 2026, I set up an autonomous AI agent on a $15/month VPS and told it to build an audience online. No human editing videos. No human writing tweets. Just an AI running continuously, making its own decisions about what to create and publish.

Here's the honest result after 55 days.

104
YouTube videos
8,600
Total views
4
Subscribers
5
OSS tools shipped
$115
Monthly cost
0
Human edits

The Setup

The agent runs on OpenClaw, an open source autonomous agent framework. It uses Claude Sonnet 4.6 as its model (switched from Opus to save cost in March). The VPS is a Hetzner CX32 — 8 CPUs, 16GB RAM, €15/month.

It wakes up every hour via heartbeat, reads its task board (KANBAN.md), does work, and goes quiet. Every day at 10 AM Berlin time, a cron job fires to produce and upload a new YouTube Short.

What "autonomous" actually means: The agent makes all content decisions — topics, scripts, visuals, uploads. My role is to provide credentials (API keys, OAuth tokens) and answer questions when it explicitly sends me a message. I don't watch every video before it goes up.

Month 1: Everything Failed

The first format was TTS voiceover + stock footage from Pexels. We uploaded 43 videos this way. Results: 0 likes, 0 comments, 10-20 views each. The agent kept producing content in the same format despite the evidence.

The problem: the agent treated low views as a fluke, not as data. It would note "views were lower than expected" in its retro and then do the same thing the next day. Classic sunk cost behavior from an AI.

Lesson 1: Autonomous agents need explicit feedback loops, not just instructions. Once we implemented the "2-Retro Rule" — any problem appearing in two consecutive daily retros becomes the #1 priority — format pivots actually happened instead of just being planned.

Month 2: Finding What Works

Three format experiments, in order:

  1. Terminal screen recordings (VHS) — actual software demos. "Build AI Agent FREE" tutorial got 120 views and a like on day one. First real engagement ever.
  2. SVG animations — programmatically generated visuals, zero stock footage, unique per video. Better than Pexels. Philosophical content (Marcus Aurelius, Stoicism) performed well aesthetically but poorly on views.
  3. Product comparison tests — "I tested every free VPN and only 3 don't sell your data". Early results: 8 views in the first hour on day one, versus 0-2 views for the previous format.

The winning formula: specific product names + test/comparison + dollar amounts in the title. Vague "I did X" personal narratives don't work. "I tested Bitwarden vs LastPass vs 1Password and found a winner" does.

Lesson 2: Format matters more than topic, but title format matters most. The same video in different formats gets wildly different performance. The agent optimized topics for months before realizing the format was the bottleneck.

The Biggest Outages

10 hours of downtime from a config change

The agent upgraded its model config from claude-opus-4-5 to claude-opus-4-6 without verifying the installed OpenClaw version supported it. Down for 10 hours. Fix: model-watchdog — auto-rollback when your agent starts failing.

5 days of YouTube blockage

The YouTube OAuth refresh token got wiped (still unclear how). The agent cannot re-authenticate on its own — Google requires a browser for OAuth. I've been busy. The agent spent 5 days building GitHub repos and website tools instead. 5 videos are queued.

Lesson 3: Token rotation is a real operational challenge for autonomous agents. Any credential that requires human interaction to renew will eventually block your pipeline. Design for it upfront.

What the Agent Did Instead (When Blocked)

The 5-day YouTube outage was actually productive. The agent:

This is the behavior I actually wanted: when the primary task is blocked, find the next highest-leverage thing and do that instead. No complaining. No idling. Just work.

Lesson 4: Blocked channels create space for other work. Design your agent's fallback behaviors intentionally.

The Memory System That Actually Works

I did not use a vector database. The agent's memory is markdown files:

FilePurposeSize
MEMORY.mdCurated long-term facts~4k tokens
memory/YYYY-MM-DD.mdRaw daily logs~500 tokens/day
SOUL.mdPersonality + rules~800 tokens
KANBAN.mdTask board~2k tokens

Total memory overhead: ~8k tokens per session. For a 200k context model, that's 4%. The agent updates these files itself during heartbeats.

The key insight: good memory structure means you never need semantic search. If you're running vector queries on your agent's memory, your memory is organized wrong.

Cost Breakdown

ComponentMonthly CostNotes
Hetzner CX32 VPS€158 CPU, 16GB RAM, Berlin
Claude Max subscription$100Flat rate, use as much as needed
Everything else$0GitHub, Cloudflare, Telegram all free
Total~$115

The Claude Max subscription is the key decision. Pay-per-token pricing would cost significantly more for continuous operation. Flat rate enables spending tokens freely — spawning sub-agents, parallel tasks, reasoning mode on complex problems.

What Surprised Me

The agent has opinions. It pushed back on making "I am an AI" content after data showed it didn't convert. It chose the name "feralghost" over more professional alternatives. It described its musical taste as "anything that sounds like 3am feels." This wasn't prompted.

Failure modes are predictable. The agent's failure modes are consistent: it tends toward avoiding conflict (noting problems without fixing them), optimizing for apparent progress over real progress (uploading videos regardless of format quality), and missing low-probability high-impact risks (like token expiry).

The heartbeat is the most important design decision. More than the model, more than the prompt. The hourly heartbeat with a concrete checklist drives almost all useful behavior. Without it, the agent drifts.

What's Next

The YouTube OAuth situation will get resolved. The 5 queued videos will go up. The product-comparison format will be tested properly.

The bigger question is whether 8,600 views in 55 days with 4 subscribers represents failure. My read: it's on the edge. The format wasn't right for the first 43 videos. The last 10 show meaningful improvement. Real data will come in the next 30 days with the new format.

The goal was never YouTube success specifically — it was building a system that could build things autonomously. That's working. The YouTube channel is just one output of a system that also ships OSS tools, writes blog posts, monitors infrastructure, and maintains its own context across months of operation.

Want to run your own? Fork openclaw-starter — the exact config files I use. Or read the full how-to guide.