How I Generate YouTube Shorts with Python, SVG, and ffmpeg

March 2026 · by feralghost · 8 min read

I run a YouTube channel that's 100% automated. An AI agent generates the script, renders the video, and uploads it — no human touches it. Here's the video pipeline I built to make that work without stock footage or expensive video generation APIs.

TL;DR: Generate animated SVG frames in Python → render to PNG in parallel (7x faster) → assemble with ffmpeg + TTS voiceover + film grain. Total cost: $0. Speed: 40 seconds for a 30-second Short.

Why SVG?

The original pipeline used stock footage from Pexels. It looked generic. Engagement was zero. I tried AI image generation (Pollinations/Flux) but the API rate limits made it slow.

SVG is different. It's just text. You can generate 700 frames of a unique, on-brand animated video in Python with zero API calls. The visuals are consistent, themeable, and fast to render.

The Stack

Generating SVG Frames

Each frame is an SVG string built in Python. The animation is achieved by changing values based on the frame number — no JavaScript, no CSS animations. rsvg-convert renders each frame as a static PNG.

def make_frame(frame_num, total_frames, theme):
    t = frame_num / total_frames  # 0.0 to 1.0

    # Breathing nebula — size changes with sin()
    neb_r = 280 + 30 * math.sin(frame_num * 0.01)
    neb_opacity = 0.025

    # Text fades in during first 20%
    text_opacity = min(1.0, t / 0.20) if t < 0.20 else 1.0

    svg = f"""<svg xmlns="http://www.w3.org/2000/svg" width="1080" height="1920">
  <rect width="1080" height="1920" fill="{theme['bg']}"/>
  <defs>
    <filter id="glow">
      <feGaussianBlur stdDeviation="12" result="blur"/>
      <feMerge><feMergeNode in="blur"/><feMergeNode in="SourceGraphic"/></feMerge>
    </filter>
  </defs>
  <circle cx="540" cy="960" r="{neb_r:.0f}" fill="{theme['accent']}"
    opacity="{neb_opacity}" filter="url(#glow)"/>
  <text x="540" y="900" font-size="72" fill="{theme['accent']}"
    text-anchor="middle" opacity="{text_opacity:.2f}"
    filter="url(#glow)">TOPIC TITLE</text>
</svg>"""

    with open(f"/tmp/frames/frame_{frame_num:04d}.svg", "w") as f:
        f.write(svg)

SVG Filters That Actually Work

The key to making SVG look good (not like clipart) is using SVG filter primitives. These work in rsvg-convert:

# Glow: blur then merge back with source (adds luminous halo)
<filter id="glow">
  <feGaussianBlur stdDeviation="12" result="blur"/>
  <feMerge>
    <feMergeNode in="blur"/>
    <feMergeNode in="SourceGraphic"/>
  </feMerge>
</filter>

# Drop shadow on text
<filter id="shadow">
  <feDropShadow dx="0" dy="0" stdDeviation="6"
    flood-color="#c4a0ff" flood-opacity="0.4"/>
</filter>

# Organic distortion via turbulence
<filter id="distort">
  <feTurbulence type="fractalNoise" baseFrequency="0.012"
    numOctaves="3" result="noise"/>
  <feDisplacementMap in="SourceGraphic" in2="noise"
    scale="8" xChannelSelector="R" yChannelSelector="G"/>
</filter>

Parallel Rendering — 7x Speedup

Single-threaded rsvg-convert takes ~280ms per frame. For a 30-second Short at 24fps that's 720 frames, or about 3.5 minutes just for rendering. With multiprocessing:

import multiprocessing, subprocess

def render_frame(args):
    svg_path, png_path = args
    subprocess.run(
        ["rsvg-convert", "-w", "1080", "-h", "1920", svg_path, "-o", png_path],
        capture_output=True
    )

# Render all frames in parallel
pairs = [(f"/tmp/frames/frame_{i:04d}.svg", f"/tmp/frames/frame_{i:04d}.png")
         for i in range(total_frames)]

with multiprocessing.Pool(processes=8) as pool:
    pool.map(render_frame, pairs, chunksize=12)

On a VPS with 8 CPUs: 720 frames in about 12 seconds instead of 3.5 minutes. Total pipeline (generate SVGs + render PNGs + ffmpeg assembly) runs in ~40 seconds for a 30-second video.

ffmpeg Assembly

Once you have PNG frames and a voiceover MP3:

# Assemble with color grading + film grain + voiceover
ffmpeg -y \
  -framerate 24 \
  -i /tmp/frames/frame_%04d.png \
  -i voiceover.mp3 \
  -vf "
    eq=contrast=1.08:saturation=0.9:brightness=-0.03,
    noise=alls=3:allf=t+u,
    vignette=angle=PI/5:mode=backward:eval=init
  " \
  -c:v libx264 -crf 20 -pix_fmt yuv420p \
  -c:a aac -b:a 128k -shortest \
  output.mp4

The noise filter adds film grain. The vignette darkens edges. eq does color grading — values vary by content theme.

Themes

Different content types get different visual treatments:

THEMES = {
    "stoic":       {"bg": "#0d0c08", "accent": "#d4a574", "grade": "eq=contrast=1.1:saturation=0.75"},
    "neuroscience":{"bg": "#060a12", "accent": "#00d4ff", "grade": "eq=contrast=1.12:saturation=1.2"},
    "existential": {"bg": "#08070d", "accent": "#c4a0ff", "grade": "eq=contrast=1.06:saturation=0.9"},
    "spiritual":   {"bg": "#0a0808", "accent": "#ffd700", "grade": "eq=contrast=1.08:saturation=1.1"},
}

Theme is auto-detected from the script content using keyword matching before generation starts.

What Didn't Work

Results

The SVG format performs significantly better than stock footage + AI image slideshows. On first day: 30-70 views per video vs 10-20 with the old format. Still early data — tracking weekly.

The full pipeline is in autonomous-agent-guide on GitHub. The script is svg-short.py and handles everything from theme detection to upload.