Project Category: Individual Project (Personal Productivity Tool)
I engineered a frame-by-frame video upscaling pipeline optimized for AMD GPUs on Windows. My architecture supports dual AI methods (Real-ESRGAN GAN-based vs EDSR CNN-based), implements AMD VCE hardware encoding with automatic CPU fallback for high-resolution outputs (>4000px), and processes videos sequentially to bypass VRAM limitations. The tool handles 30+ minute videos at 4x upscale (1080p 4K) while preserving audio tracks.
A Generative Adversarial Network (GAN) trained on degraded images to learn realistic upscaling patterns. Uses Vulkan acceleration for GPU inference without CUDA dependency. Real-ESRGAN processes each frame individually via the realesrgan-ncnn-vulkan.exe
executable, providing 2-5 second per-frame processing with superior detail recovery compared to classical methods.
Enhanced Deep Super-Resolution using Convolutional Neural Networks. Trained on high-resolution image datasets, EDSR uses residual scaling to preserve details during upscaling. Requires more processing time (15-30 seconds per frame) but runs on CPU, making it suitable for systems without compatible GPUs.
Hardware-accelerated H.264 encoding on AMD GPUs via the Advanced Media Framework (AMF). VCE offloads video encoding from CPU to GPU'"'"'s dedicated encoding blocks, achieving real-time encoding speeds (30+ FPS). My implementation uses FFmpeg'"'"'s h264_amf
encoder with automatic fallback to libx264
CPU encoding when VCE fails (common with >4000px width or >2000px height videos).
FFmpeg handles video demuxing (extract frames/audio), classical upscaling (Lanczos interpolation), and video muxing (reassemble frames + audio). OpenCV reads video metadata (FPS, resolution, frame count) and performs frame verification post-upscaling. I use FFmpeg'"'"'s -c:v h264_amf
for AMD GPU encoding and -c:a aac
for audio preservation.
I designed this sequential processing system to handle unlimited video lengths without VRAM constraints:
Extract all video frames to PNG files using FFmpeg -vf fps=original_fps
, storing in timestamped directories (e.g., output/20251018-1200-12345_frames/
)
Process each frame individually with Real-ESRGAN or EDSR. My progress bar tracks frame_current/frame_total
with real-time ETA calculation
My code checks upscaled frame dimensions using OpenCV imread()
to ensure width == original_width * scale
before reassembly
Combine upscaled frames using FFmpeg with AMD VCE encoding. My fallback logic detects VCE failures via exit codes and retries with CPU encoding
Extract original audio with -vn -acodec copy
, then mux back using -i upscaled_video.mp4 -i original_audio.aac -c copy
Prompt user to keep/delete frame folders. My batch interface asks: "Keep extracted frames? (y/n)" with automatic cleanup on '"'n'"'
Original Approach (Failed): Load entire video into VRAM Process all frames in batch Output video
Issue: 30-minute 1080p video = ~54,000 frames 8MB/frame = 432GB VRAM required (impossible on consumer GPUs)
My Solution: Frame-by-frame sequential processing
Extract frames to disk (FFmpeg)
Process one frame at a time (Real-ESRGAN reads frame upscales saves releases memory)
Reassemble from disk using FFmpeg
Result: Peak VRAM usage = 1 frame (~8MB) regardless of video length. Can now process 2-hour videos on 8GB GPUs.
realesrgan-ncnn-vulkan.exe
with model selection based on scale (2x/3x/4x)EDSR_x{scale}.pb
models via OpenCV DNN moduleI engineered automatic encoding strategy selection based on resolution:
# Primary encoding strategy (AMD GPU)
ffmpeg -framerate {fps} -i frames/%06d.png -i audio.aac \
-c:v h264_amf -quality quality -rc cqp -qp 18 \
-c:a aac -b:a 192k output.mp4
h264_amf
(AMD VCE hardware encoder)-rc cqp
(Constant Quantization Parameter for quality)-qp 18
(visually lossless)# Triggered when width > 4000px OR height > 2000px
ffmpeg -framerate {fps} -i frames/%06d.png -i audio.aac \
-c:v libx264 -preset ultrafast -crf 28 \
-c:a aac -b:a 192k output.mp4
libx264
(CPU-based H.264 encoder)ultrafast
for high-res, fast
for standard-res-crf 28
for high-res (balanced size/quality), -crf 18
for standard-resScale | Method | Processing Speed | Quality | GPU Usage |
---|---|---|---|---|
2x | Real-ESRGAN | ~5-10 fps | Excellent | Vulkan |
3x | Real-ESRGAN | ~2-5 fps | Excellent | Vulkan |
4x | Real-ESRGAN | ~1-3 fps (Recommended) | Excellent | Vulkan |
2x | EDSR (AI CNN) | ~15-30 fps | Good | CPU |
4x | FFmpeg Classical | ~5-15 fps | Fair | CPU |
Note: Times based on 1920x1080 input on AMD RX 6800 XT. Processing time = total_frames / fps.
# From helper_tools root
video_upscaler.bat
py video_upscaler.py "input_video.mp4"
# AI upscaling with custom scale
py video_upscaler.py input.mp4 --method realesrgan --scale 4
# Fast hardware upscaling (FFmpeg)
py video_upscaler.py input.mp4 --method ffmpeg --scale 2
# Extract frames only (for manual processing)
py video_upscaler.py input.mp4 --method extract
# Process existing frames to GIF
py video_upscaler.py input.mp4 --method process_existing --format gif
# Custom output path
py video_upscaler.py input.mp4 --output enhanced_video.mp4
video_upscaler/
video_upscaler.py # My main script
realesrgan-windows/ # Vulkan executables
realesrgan-ncnn-vulkan.exe
models/
realesr-animevideov3-x2.param
realesr-animevideov3-x4.param
realesrgan-x4plus.param
models/ # EDSR AI models
EDSR_x2.pb
EDSR_x3.pb
EDSR_x4.pb
input/ # Place input videos here
output/ # Enhanced videos and frame folders
video_name_upscaled_x4.mp4
20251018-1200-12345_frames/
20251018-1200-12345_frames_upscaled_x4/
======================================================================
Video Upscaling: sample_video.mp4
Scale: 4x | Method: Real-ESRGAN
Output: sample_video_upscaled_x4.mp4
======================================================================
Video Info:
Resolution: 1920x1080
FPS: 30.0
Total frames: 900
Duration: 30.0s (0.5 minutes)
Target resolution: 7680x4320
Processing 900 frames...
[] 100.0% - Frame 900/900 7680x4320
Reassembling video with AMD VCE encoding...
Preserving original audio track...
======================================================================
VIDEO UPSCALING COMPLETE
======================================================================
Input: sample_video.mp4
Output: sample_video_upscaled_x4.mp4
Scale factor: 4x
Resolution: 1920x1080 7680x4320
Frames processed: 900/900
FPS: 30.0
Duration: 30.0s
Processing time: 15 minutes
Encoding: AMD VCE (h264_amf)
======================================================================
My code automatically retries with CPU encoding. If you see this message, the tool is switching from h264_amf
to libx264
.