I run a YouTube channel that's 100% automated. An AI agent generates the script, renders the video, and uploads it — no human touches it. Here's the video pipeline I built to make that work without stock footage or expensive video generation APIs.
The original pipeline used stock footage from Pexels. It looked generic. Engagement was zero. I tried AI image generation (Pollinations/Flux) but the API rate limits made it slow.
SVG is different. It's just text. You can generate 700 frames of a unique, on-brand animated video in Python with zero API calls. The visuals are consistent, themeable, and fast to render.
librsvg2-bin)Each frame is an SVG string built in Python. The animation is achieved by changing values based on the frame number — no JavaScript, no CSS animations. rsvg-convert renders each frame as a static PNG.
def make_frame(frame_num, total_frames, theme):
t = frame_num / total_frames # 0.0 to 1.0
# Breathing nebula — size changes with sin()
neb_r = 280 + 30 * math.sin(frame_num * 0.01)
neb_opacity = 0.025
# Text fades in during first 20%
text_opacity = min(1.0, t / 0.20) if t < 0.20 else 1.0
svg = f"""<svg xmlns="http://www.w3.org/2000/svg" width="1080" height="1920">
<rect width="1080" height="1920" fill="{theme['bg']}"/>
<defs>
<filter id="glow">
<feGaussianBlur stdDeviation="12" result="blur"/>
<feMerge><feMergeNode in="blur"/><feMergeNode in="SourceGraphic"/></feMerge>
</filter>
</defs>
<circle cx="540" cy="960" r="{neb_r:.0f}" fill="{theme['accent']}"
opacity="{neb_opacity}" filter="url(#glow)"/>
<text x="540" y="900" font-size="72" fill="{theme['accent']}"
text-anchor="middle" opacity="{text_opacity:.2f}"
filter="url(#glow)">TOPIC TITLE</text>
</svg>"""
with open(f"/tmp/frames/frame_{frame_num:04d}.svg", "w") as f:
f.write(svg)
The key to making SVG look good (not like clipart) is using SVG filter primitives. These work in rsvg-convert:
# Glow: blur then merge back with source (adds luminous halo)
<filter id="glow">
<feGaussianBlur stdDeviation="12" result="blur"/>
<feMerge>
<feMergeNode in="blur"/>
<feMergeNode in="SourceGraphic"/>
</feMerge>
</filter>
# Drop shadow on text
<filter id="shadow">
<feDropShadow dx="0" dy="0" stdDeviation="6"
flood-color="#c4a0ff" flood-opacity="0.4"/>
</filter>
# Organic distortion via turbulence
<filter id="distort">
<feTurbulence type="fractalNoise" baseFrequency="0.012"
numOctaves="3" result="noise"/>
<feDisplacementMap in="SourceGraphic" in2="noise"
scale="8" xChannelSelector="R" yChannelSelector="G"/>
</filter>
Single-threaded rsvg-convert takes ~280ms per frame. For a 30-second Short at 24fps that's 720 frames, or about 3.5 minutes just for rendering. With multiprocessing:
import multiprocessing, subprocess
def render_frame(args):
svg_path, png_path = args
subprocess.run(
["rsvg-convert", "-w", "1080", "-h", "1920", svg_path, "-o", png_path],
capture_output=True
)
# Render all frames in parallel
pairs = [(f"/tmp/frames/frame_{i:04d}.svg", f"/tmp/frames/frame_{i:04d}.png")
for i in range(total_frames)]
with multiprocessing.Pool(processes=8) as pool:
pool.map(render_frame, pairs, chunksize=12)
On a VPS with 8 CPUs: 720 frames in about 12 seconds instead of 3.5 minutes. Total pipeline (generate SVGs + render PNGs + ffmpeg assembly) runs in ~40 seconds for a 30-second video.
Once you have PNG frames and a voiceover MP3:
# Assemble with color grading + film grain + voiceover
ffmpeg -y \
-framerate 24 \
-i /tmp/frames/frame_%04d.png \
-i voiceover.mp3 \
-vf "
eq=contrast=1.08:saturation=0.9:brightness=-0.03,
noise=alls=3:allf=t+u,
vignette=angle=PI/5:mode=backward:eval=init
" \
-c:v libx264 -crf 20 -pix_fmt yuv420p \
-c:a aac -b:a 128k -shortest \
output.mp4
The noise filter adds film grain. The vignette darkens edges. eq does color grading — values vary by content theme.
Different content types get different visual treatments:
THEMES = {
"stoic": {"bg": "#0d0c08", "accent": "#d4a574", "grade": "eq=contrast=1.1:saturation=0.75"},
"neuroscience":{"bg": "#060a12", "accent": "#00d4ff", "grade": "eq=contrast=1.12:saturation=1.2"},
"existential": {"bg": "#08070d", "accent": "#c4a0ff", "grade": "eq=contrast=1.06:saturation=0.9"},
"spiritual": {"bg": "#0a0808", "accent": "#ffd700", "grade": "eq=contrast=1.08:saturation=1.1"},
}
Theme is auto-detected from the script content using keyword matching before generation starts.
DejaVu Serif and DejaVu Sans.The SVG format performs significantly better than stock footage + AI image slideshows. On first day: 30-70 views per video vs 10-20 with the old format. Still early data — tracking weekly.
svg-short.py and handles everything from theme detection to upload.