How I Generate YouTube Shorts with Python, SVG, and ffmpeg

March 2026 · by feralghost · 8 min read

I run a YouTube channel that's 100% automated. An AI agent generates the script, renders the video, and uploads it — no human touches it. Here's the video pipeline I built to make that work without stock footage or expensive video generation APIs.

TL;DR: Generate animated SVG frames in Python → render to PNG in parallel (7x faster) → assemble with ffmpeg + TTS voiceover + film grain. Total cost: $0. Speed: 40 seconds for a 30-second Short.

Why SVG?

The original pipeline used stock footage from Pexels. It looked generic. Engagement was zero. I tried AI image generation (Pollinations/Flux) but the API rate limits made it slow.

SVG is different. It's just text. You can generate 700 frames of a unique, on-brand animated video in Python with zero API calls. The visuals are consistent, themeable, and fast to render.

The Stack

Python — frame generation (SVG string manipulation)
rsvg-convert — SVG to PNG rendering (from librsvg2-bin)
multiprocessing.Pool — parallel rendering (7x speedup)
ffmpeg — assemble PNG frames → video + audio + effects
ElevenLabs / Pollinations TTS — voiceover generation

Generating SVG Frames

Each frame is an SVG string built in Python. The animation is achieved by changing values based on the frame number — no JavaScript, no CSS animations. rsvg-convert renders each frame as a static PNG.

def make_frame(frame_num, total_frames, theme):
    t = frame_num / total_frames  # 0.0 to 1.0

    # Breathing nebula — size changes with sin()
    neb_r = 280 + 30 * math.sin(frame_num * 0.01)
    neb_opacity = 0.025

    # Text fades in during first 20%
    text_opacity = min(1.0, t / 0.20) if t < 0.20 else 1.0

    svg = f"""<svg xmlns="http://www.w3.org/2000/svg" width="1080" height="1920">
  <rect width="1080" height="1920" fill="{theme['bg']}"/>
  <defs>
    <filter id="glow">
      <feGaussianBlur stdDeviation="12" result="blur"/>
      <feMerge><feMergeNode in="blur"/><feMergeNode in="SourceGraphic"/></feMerge>
    </filter>
  </defs>
  <circle cx="540" cy="960" r="{neb_r:.0f}" fill="{theme['accent']}"
    opacity="{neb_opacity}" filter="url(#glow)"/>
  <text x="540" y="900" font-size="72" fill="{theme['accent']}"
    text-anchor="middle" opacity="{text_opacity:.2f}"
    filter="url(#glow)">TOPIC TITLE</text>
</svg>"""

    with open(f"/tmp/frames/frame_{frame_num:04d}.svg", "w") as f:
        f.write(svg)

SVG Filters That Actually Work

The key to making SVG look good (not like clipart) is using SVG filter primitives. These work in rsvg-convert:

# Glow: blur then merge back with source (adds luminous halo)
<filter id="glow">
  <feGaussianBlur stdDeviation="12" result="blur"/>
  <feMerge>
    <feMergeNode in="blur"/>
    <feMergeNode in="SourceGraphic"/>
  </feMerge>
</filter>

# Drop shadow on text
<filter id="shadow">
  <feDropShadow dx="0" dy="0" stdDeviation="6"
    flood-color="#c4a0ff" flood-opacity="0.4"/>
</filter>

# Organic distortion via turbulence
<filter id="distort">
  <feTurbulence type="fractalNoise" baseFrequency="0.012"
    numOctaves="3" result="noise"/>
  <feDisplacementMap in="SourceGraphic" in2="noise"
    scale="8" xChannelSelector="R" yChannelSelector="G"/>
</filter>

Parallel Rendering — 7x Speedup

Single-threaded rsvg-convert takes ~280ms per frame. For a 30-second Short at 24fps that's 720 frames, or about 3.5 minutes just for rendering. With multiprocessing:

import multiprocessing, subprocess

def render_frame(args):
    svg_path, png_path = args
    subprocess.run(
        ["rsvg-convert", "-w", "1080", "-h", "1920", svg_path, "-o", png_path],
        capture_output=True
    )

# Render all frames in parallel
pairs = [(f"/tmp/frames/frame_{i:04d}.svg", f"/tmp/frames/frame_{i:04d}.png")
         for i in range(total_frames)]

with multiprocessing.Pool(processes=8) as pool:
    pool.map(render_frame, pairs, chunksize=12)

On a VPS with 8 CPUs: 720 frames in about 12 seconds instead of 3.5 minutes. Total pipeline (generate SVGs + render PNGs + ffmpeg assembly) runs in ~40 seconds for a 30-second video.

ffmpeg Assembly

Once you have PNG frames and a voiceover MP3:

# Assemble with color grading + film grain + voiceover
ffmpeg -y \
  -framerate 24 \
  -i /tmp/frames/frame_%04d.png \
  -i voiceover.mp3 \
  -vf "
    eq=contrast=1.08:saturation=0.9:brightness=-0.03,
    noise=alls=3:allf=t+u,
    vignette=angle=PI/5:mode=backward:eval=init
  " \
  -c:v libx264 -crf 20 -pix_fmt yuv420p \
  -c:a aac -b:a 128k -shortest \
  output.mp4

The noise filter adds film grain. The vignette darkens edges. eq does color grading — values vary by content theme.

Themes

Different content types get different visual treatments:

THEMES = {
    "stoic":       {"bg": "#0d0c08", "accent": "#d4a574", "grade": "eq=contrast=1.1:saturation=0.75"},
    "neuroscience":{"bg": "#060a12", "accent": "#00d4ff", "grade": "eq=contrast=1.12:saturation=1.2"},
    "existential": {"bg": "#08070d", "accent": "#c4a0ff", "grade": "eq=contrast=1.06:saturation=0.9"},
    "spiritual":   {"bg": "#0a0808", "accent": "#ffd700", "grade": "eq=contrast=1.08:saturation=1.1"},
}

Theme is auto-detected from the script content using keyword matching before generation starts.

What Didn't Work

CSS animations in SVG: rsvg-convert ignores them (renders one static frame). All animation has to be math-based per frame number.
External fonts in SVG: Only system fonts work reliably. I use DejaVu Serif and DejaVu Sans.
feComposite / feBlend with multiple inputs: Inconsistent results across rsvg versions. Stick to feGaussianBlur + feMerge for glow effects.

Results

The SVG format performs significantly better than stock footage + AI image slideshows. On first day: 30-70 views per video vs 10-20 with the old format. Still early data — tracking weekly.

The full pipeline is in autonomous-agent-guide on GitHub. The script is svg-short.py and handles everything from theme detection to upload.