Back to Projects

AI Video Upscaler

Real-ESRGAN EDSR FFmpeg AMD VCE OpenCV Vulkan Python 3.13

\n\n

Project Category: Individual Project (Personal Productivity Tool)

I engineered a frame-by-frame video upscaling pipeline optimized for AMD GPUs on Windows. My architecture supports dual AI methods (Real-ESRGAN GAN-based vs EDSR CNN-based), implements AMD VCE hardware encoding with automatic CPU fallback for high-resolution outputs (>4000px), and processes videos sequentially to bypass VRAM limitations. The tool handles 30+ minute videos at 4x upscale (1080p 4K) while preserving audio tracks.

Technologies & Tools Used

Real-ESRGAN (GAN Architecture)

A Generative Adversarial Network (GAN) trained on degraded images to learn realistic upscaling patterns. Uses Vulkan acceleration for GPU inference without CUDA dependency. Real-ESRGAN processes each frame individually via the realesrgan-ncnn-vulkan.exe executable, providing 2-5 second per-frame processing with superior detail recovery compared to classical methods.

EDSR (CNN Architecture)

Enhanced Deep Super-Resolution using Convolutional Neural Networks. Trained on high-resolution image datasets, EDSR uses residual scaling to preserve details during upscaling. Requires more processing time (15-30 seconds per frame) but runs on CPU, making it suitable for systems without compatible GPUs.

AMD Video Codec Engine (VCE)

Hardware-accelerated H.264 encoding on AMD GPUs via the Advanced Media Framework (AMF). VCE offloads video encoding from CPU to GPU'"'"'s dedicated encoding blocks, achieving real-time encoding speeds (30+ FPS). My implementation uses FFmpeg'"'"'s h264_amf encoder with automatic fallback to libx264 CPU encoding when VCE fails (common with >4000px width or >2000px height videos).

FFmpeg & OpenCV

FFmpeg handles video demuxing (extract frames/audio), classical upscaling (Lanczos interpolation), and video muxing (reassemble frames + audio). OpenCV reads video metadata (FPS, resolution, frame count) and performs frame verification post-upscaling. I use FFmpeg'"'"'s -c:v h264_amf for AMD GPU encoding and -c:a aac for audio preservation.

My Frame-by-Frame Pipeline Architecture

I designed this sequential processing system to handle unlimited video lengths without VRAM constraints:

1. Frame Extraction

Extract all video frames to PNG files using FFmpeg -vf fps=original_fps, storing in timestamped directories (e.g., output/20251018-1200-12345_frames/)

2. Sequential Upscaling

Process each frame individually with Real-ESRGAN or EDSR. My progress bar tracks frame_current/frame_total with real-time ETA calculation

3. Dimension Verification

My code checks upscaled frame dimensions using OpenCV imread() to ensure width == original_width * scale before reassembly

4. Video Reassembly

Combine upscaled frames using FFmpeg with AMD VCE encoding. My fallback logic detects VCE failures via exit codes and retries with CPU encoding

5. Audio Preservation

Extract original audio with -vn -acodec copy, then mux back using -i upscaled_video.mp4 -i original_audio.aac -c copy

6. Cleanup Management

Prompt user to keep/delete frame folders. My batch interface asks: "Keep extracted frames? (y/n)" with automatic cleanup on '"'n'"'

Problem I Solved: VRAM Limitations

Original Approach (Failed): Load entire video into VRAM Process all frames in batch Output video
Issue: 30-minute 1080p video = ~54,000 frames 8MB/frame = 432GB VRAM required (impossible on consumer GPUs)

My Solution: Frame-by-frame sequential processing
Extract frames to disk (FFmpeg)
Process one frame at a time (Real-ESRGAN reads frame upscales saves releases memory)
Reassemble from disk using FFmpeg

Result: Peak VRAM usage = 1 frame (~8MB) regardless of video length. Can now process 2-hour videos on 8GB GPUs.

Upscaling Methods Comparison

Real-ESRGAN (AI - GAN)

Architecture: Generative Adversarial Network
Processing: 2-5 seconds per frame
Quality: Superior detail recovery, realistic textures
Hardware: Vulkan GPU acceleration required
Use Case: Photos, anime, live-action footage
My Implementation: Calls realesrgan-ncnn-vulkan.exe with model selection based on scale (2x/3x/4x)

EDSR (AI - CNN)

Architecture: Convolutional Neural Network
Processing: 15-30 seconds per frame
Quality: Excellent sharpness, minimal artifacts
Hardware: CPU-compatible (no GPU required)
Use Case: Screen recordings, technical diagrams
My Implementation: Loads EDSR_x{scale}.pb models via OpenCV DNN module

AMD Hardware Encoding with Intelligent Fallback

I engineered automatic encoding strategy selection based on resolution:

My AMD VCE Implementation

# Primary encoding strategy (AMD GPU)
ffmpeg -framerate {fps} -i frames/%06d.png -i audio.aac \
       -c:v h264_amf -quality quality -rc cqp -qp 18 \
       -c:a aac -b:a 192k output.mp4

Codec: h264_amf (AMD VCE hardware encoder)
Rate Control: -rc cqp (Constant Quantization Parameter for quality)
Quality: -qp 18 (visually lossless)
Speed: Real-time encoding (30+ FPS on RX 6800 XT)

My CPU Fallback Logic

# Triggered when width > 4000px OR height > 2000px
ffmpeg -framerate {fps} -i frames/%06d.png -i audio.aac \
       -c:v libx264 -preset ultrafast -crf 28 \
       -c:a aac -b:a 192k output.mp4

Codec: libx264 (CPU-based H.264 encoder)
Preset: ultrafast for high-res, fast for standard-res
Quality: -crf 28 for high-res (balanced size/quality), -crf 18 for standard-res
Reason: AMD VCE has encoder limits (~4096px max width) - CPU fallback ensures reliability

Performance Benchmarks

Scale	Method	Processing Speed	Quality	GPU Usage
2x	Real-ESRGAN	~5-10 fps	Excellent	Vulkan
3x	Real-ESRGAN	~2-5 fps	Excellent	Vulkan
4x	Real-ESRGAN	~1-3 fps (Recommended)	Excellent	Vulkan
2x	EDSR (AI CNN)	~15-30 fps	Good	CPU
4x	FFmpeg Classical	~5-15 fps	Fair	CPU

Note: Times based on 1920x1080 input on AMD RX 6800 XT. Processing time = total_frames / fps.

Usage

Interactive Mode (Recommended)

# From helper_tools root
video_upscaler.bat

Command Line - Basic

py video_upscaler.py "input_video.mp4"

Advanced Options

# AI upscaling with custom scale
py video_upscaler.py input.mp4 --method realesrgan --scale 4

# Fast hardware upscaling (FFmpeg)
py video_upscaler.py input.mp4 --method ffmpeg --scale 2

# Extract frames only (for manual processing)
py video_upscaler.py input.mp4 --method extract

# Process existing frames to GIF
py video_upscaler.py input.mp4 --method process_existing --format gif

# Custom output path
py video_upscaler.py input.mp4 --output enhanced_video.mp4

Directory Structure

video_upscaler/
 video_upscaler.py          # My main script
 realesrgan-windows/        # Vulkan executables
    realesrgan-ncnn-vulkan.exe
    models/
        realesr-animevideov3-x2.param
        realesr-animevideov3-x4.param
        realesrgan-x4plus.param
 models/                    # EDSR AI models
    EDSR_x2.pb
    EDSR_x3.pb
    EDSR_x4.pb
 input/                     # Place input videos here
 output/                    # Enhanced videos and frame folders
     video_name_upscaled_x4.mp4
     20251018-1200-12345_frames/
     20251018-1200-12345_frames_upscaled_x4/

Example Output

======================================================================
 Video Upscaling: sample_video.mp4
Scale: 4x | Method: Real-ESRGAN
Output: sample_video_upscaled_x4.mp4
======================================================================

 Video Info:
   Resolution: 1920x1080
   FPS: 30.0
   Total frames: 900
   Duration: 30.0s (0.5 minutes)
   Target resolution: 7680x4320

 Processing 900 frames...

[] 100.0% - Frame 900/900  7680x4320

 Reassembling video with AMD VCE encoding...
 Preserving original audio track...

======================================================================
 VIDEO UPSCALING COMPLETE
======================================================================
Input: sample_video.mp4
Output: sample_video_upscaled_x4.mp4
Scale factor: 4x
Resolution: 1920x1080  7680x4320
Frames processed: 900/900
FPS: 30.0
Duration: 30.0s
Processing time: 15 minutes
Encoding: AMD VCE (h264_amf)
======================================================================

Troubleshooting

"AMD encoding failed" Error

My code automatically retries with CPU encoding. If you see this message, the tool is switching from h264_amf to libx264.

High-Resolution CPU Fallback

Triggered when upscaled width > 4000px or height > 2000px
Normal behavior - AMD VCE has encoder limits (~4096px)
CPU encoding will be slower but still produces correct output

Out of Memory

Use lower scale factor (2x instead of 4x)
Close other GPU applications
Use EDSR instead of Real-ESRGAN (CPU-based)

Use Cases

Old Footage Restoration: Upscale low-resolution archive videos to modern standards
Screen Recordings: Enhance tutorial videos recorded at 720p to 1080p/4K
Surveillance Enhancement: Improve CCTV footage quality for detail analysis
Animation Sharpening: Upscale hand-drawn or CGI animation frames
Archival Projects: Restore damaged or heavily compressed videos

View on GitHub