Back to Projects

AI Video Upscaler

Real-ESRGAN EDSR FFmpeg AMD VCE OpenCV Vulkan Python 3.13
\n\n

Project Category: Individual Project (Personal Productivity Tool)

I engineered a frame-by-frame video upscaling pipeline optimized for AMD GPUs on Windows. My architecture supports dual AI methods (Real-ESRGAN GAN-based vs EDSR CNN-based), implements AMD VCE hardware encoding with automatic CPU fallback for high-resolution outputs (>4000px), and processes videos sequentially to bypass VRAM limitations. The tool handles 30+ minute videos at 4x upscale (1080p 4K) while preserving audio tracks.

Technologies & Tools Used

Real-ESRGAN (GAN Architecture)

A Generative Adversarial Network (GAN) trained on degraded images to learn realistic upscaling patterns. Uses Vulkan acceleration for GPU inference without CUDA dependency. Real-ESRGAN processes each frame individually via the realesrgan-ncnn-vulkan.exe executable, providing 2-5 second per-frame processing with superior detail recovery compared to classical methods.

EDSR (CNN Architecture)

Enhanced Deep Super-Resolution using Convolutional Neural Networks. Trained on high-resolution image datasets, EDSR uses residual scaling to preserve details during upscaling. Requires more processing time (15-30 seconds per frame) but runs on CPU, making it suitable for systems without compatible GPUs.

AMD Video Codec Engine (VCE)

Hardware-accelerated H.264 encoding on AMD GPUs via the Advanced Media Framework (AMF). VCE offloads video encoding from CPU to GPU'"'"'s dedicated encoding blocks, achieving real-time encoding speeds (30+ FPS). My implementation uses FFmpeg'"'"'s h264_amf encoder with automatic fallback to libx264 CPU encoding when VCE fails (common with >4000px width or >2000px height videos).

FFmpeg & OpenCV

FFmpeg handles video demuxing (extract frames/audio), classical upscaling (Lanczos interpolation), and video muxing (reassemble frames + audio). OpenCV reads video metadata (FPS, resolution, frame count) and performs frame verification post-upscaling. I use FFmpeg'"'"'s -c:v h264_amf for AMD GPU encoding and -c:a aac for audio preservation.

My Frame-by-Frame Pipeline Architecture

I designed this sequential processing system to handle unlimited video lengths without VRAM constraints:

1. Frame Extraction

Extract all video frames to PNG files using FFmpeg -vf fps=original_fps, storing in timestamped directories (e.g., output/20251018-1200-12345_frames/)

2. Sequential Upscaling

Process each frame individually with Real-ESRGAN or EDSR. My progress bar tracks frame_current/frame_total with real-time ETA calculation

3. Dimension Verification

My code checks upscaled frame dimensions using OpenCV imread() to ensure width == original_width * scale before reassembly

4. Video Reassembly

Combine upscaled frames using FFmpeg with AMD VCE encoding. My fallback logic detects VCE failures via exit codes and retries with CPU encoding

5. Audio Preservation

Extract original audio with -vn -acodec copy, then mux back using -i upscaled_video.mp4 -i original_audio.aac -c copy

6. Cleanup Management

Prompt user to keep/delete frame folders. My batch interface asks: "Keep extracted frames? (y/n)" with automatic cleanup on '"'n'"'

Problem I Solved: VRAM Limitations

Original Approach (Failed): Load entire video into VRAM Process all frames in batch Output video
Issue: 30-minute 1080p video = ~54,000 frames 8MB/frame = 432GB VRAM required (impossible on consumer GPUs)

My Solution: Frame-by-frame sequential processing
Extract frames to disk (FFmpeg)
Process one frame at a time (Real-ESRGAN reads frame upscales saves releases memory)
Reassemble from disk using FFmpeg

Result: Peak VRAM usage = 1 frame (~8MB) regardless of video length. Can now process 2-hour videos on 8GB GPUs.

Upscaling Methods Comparison

Real-ESRGAN (AI - GAN)

  • Architecture: Generative Adversarial Network
  • Processing: 2-5 seconds per frame
  • Quality: Superior detail recovery, realistic textures
  • Hardware: Vulkan GPU acceleration required
  • Use Case: Photos, anime, live-action footage
  • My Implementation: Calls realesrgan-ncnn-vulkan.exe with model selection based on scale (2x/3x/4x)

EDSR (AI - CNN)

  • Architecture: Convolutional Neural Network
  • Processing: 15-30 seconds per frame
  • Quality: Excellent sharpness, minimal artifacts
  • Hardware: CPU-compatible (no GPU required)
  • Use Case: Screen recordings, technical diagrams
  • My Implementation: Loads EDSR_x{scale}.pb models via OpenCV DNN module

AMD Hardware Encoding with Intelligent Fallback

I engineered automatic encoding strategy selection based on resolution:

My AMD VCE Implementation

# Primary encoding strategy (AMD GPU)
ffmpeg -framerate {fps} -i frames/%06d.png -i audio.aac \
       -c:v h264_amf -quality quality -rc cqp -qp 18 \
       -c:a aac -b:a 192k output.mp4

My CPU Fallback Logic

# Triggered when width > 4000px OR height > 2000px
ffmpeg -framerate {fps} -i frames/%06d.png -i audio.aac \
       -c:v libx264 -preset ultrafast -crf 28 \
       -c:a aac -b:a 192k output.mp4

Performance Benchmarks

Scale Method Processing Speed Quality GPU Usage
2x Real-ESRGAN ~5-10 fps Excellent Vulkan
3x Real-ESRGAN ~2-5 fps Excellent Vulkan
4x Real-ESRGAN ~1-3 fps (Recommended) Excellent Vulkan
2x EDSR (AI CNN) ~15-30 fps Good CPU
4x FFmpeg Classical ~5-15 fps Fair CPU

Note: Times based on 1920x1080 input on AMD RX 6800 XT. Processing time = total_frames / fps.

Usage

Interactive Mode (Recommended)

# From helper_tools root
video_upscaler.bat

Command Line - Basic

py video_upscaler.py "input_video.mp4"

Advanced Options

# AI upscaling with custom scale
py video_upscaler.py input.mp4 --method realesrgan --scale 4

# Fast hardware upscaling (FFmpeg)
py video_upscaler.py input.mp4 --method ffmpeg --scale 2

# Extract frames only (for manual processing)
py video_upscaler.py input.mp4 --method extract

# Process existing frames to GIF
py video_upscaler.py input.mp4 --method process_existing --format gif

# Custom output path
py video_upscaler.py input.mp4 --output enhanced_video.mp4

Directory Structure

video_upscaler/
 video_upscaler.py          # My main script
 realesrgan-windows/        # Vulkan executables
    realesrgan-ncnn-vulkan.exe
    models/
        realesr-animevideov3-x2.param
        realesr-animevideov3-x4.param
        realesrgan-x4plus.param
 models/                    # EDSR AI models
    EDSR_x2.pb
    EDSR_x3.pb
    EDSR_x4.pb
 input/                     # Place input videos here
 output/                    # Enhanced videos and frame folders
     video_name_upscaled_x4.mp4
     20251018-1200-12345_frames/
     20251018-1200-12345_frames_upscaled_x4/

Example Output

======================================================================
 Video Upscaling: sample_video.mp4
Scale: 4x | Method: Real-ESRGAN
Output: sample_video_upscaled_x4.mp4
======================================================================

 Video Info:
   Resolution: 1920x1080
   FPS: 30.0
   Total frames: 900
   Duration: 30.0s (0.5 minutes)
   Target resolution: 7680x4320

 Processing 900 frames...

[] 100.0% - Frame 900/900  7680x4320

 Reassembling video with AMD VCE encoding...
 Preserving original audio track...

======================================================================
 VIDEO UPSCALING COMPLETE
======================================================================
Input: sample_video.mp4
Output: sample_video_upscaled_x4.mp4
Scale factor: 4x
Resolution: 1920x1080  7680x4320
Frames processed: 900/900
FPS: 30.0
Duration: 30.0s
Processing time: 15 minutes
Encoding: AMD VCE (h264_amf)
======================================================================

Troubleshooting

"AMD encoding failed" Error

My code automatically retries with CPU encoding. If you see this message, the tool is switching from h264_amf to libx264.

High-Resolution CPU Fallback

Out of Memory

Use Cases

View on GitHub