← Back to Projects

🔍 File Scanner

Project Category: Individual Project (Personal Productivity Tool)

.NET Framework PowerShell Min-Heap Algorithm BFS Traversal

High-performance file discovery tool engineered for rapid filesystem analysis on Windows. Leverages .NET Framework APIs and advanced data structures to identify the largest files across entire drives with performance exceeding standard PowerShell cmdlets by 16-50x.

📊 Benchmark Results

Tested on: 16-core CPU, 32GB RAM, 1.82TB HDD

Scan Target Files Discovered Execution Time Throughput
Complete C:\ drive (media filter) 469,385 151 seconds 3,107 files/sec
User Downloads folder (video filter) 76 2.1 seconds 36 files/sec
User Music folder (audio filter) 621 0.07 seconds 8,821 files/sec

⚡ Performance Comparison

Method Time to Scan C:\ Throughput Performance
File Scanner (.NET APIs) 151 seconds 3,107 files/sec Baseline (1x)
PowerShell Get-ChildItem 2,500+ seconds ~188 files/sec 16-50x slower
Get-ChildItem with filters 1,800+ seconds ~260 files/sec 12x slower

🏗️ Core Architecture

1. Min-Heap (Priority Queue)

Implements a min-heap via .NET's SortedSet<T> to maintain the top N largest files.
Time complexity: O(M log K) where M = total files, K = top files to track
Space complexity: O(K) - constant regardless of total file count

$minHeap = [System.Collections.Generic.SortedSet[object]]::new(
    [System.Collections.Generic.Comparer[object]]::Create({
        param($a, $b)
        $a.Length.CompareTo($b.Length)  # Ascending order
    })
)

# Insert if heap not full or file larger than minimum
if ($minHeap.Count -lt $TopCount) {
    $minHeap.Add($fileInfo)
} elseif ($fileInfo.Length -gt $minHeap.Min.Length) {
    $minHeap.Remove($minHeap.Min)
    $minHeap.Add($fileInfo)
}

2. Breadth-First Search (BFS)

Directory traversal uses FIFO queue for BFS instead of recursive DFS.
Advantages:
• Prevents stack overflow (Windows can have 10,000+ nested directories)
• Better cache locality (processes all files in directory before moving)
• Predictable memory usage: O(D) where D = directory count

$queue = [System.Collections.Generic.Queue[string]]::new()
$queue.Enqueue($RootPath)

while ($queue.Count -gt 0) {
    $currentPath = $queue.Dequeue()
    foreach ($dir in [System.IO.Directory]::EnumerateDirectories($currentPath)) {
        $queue.Enqueue($dir)
    }
    foreach ($file in [System.IO.Directory]::EnumerateFiles($currentPath)) {
        # Process file
    }
}

💻 Usage

# Scan C:\ drive for top 73 largest media files
.\file_scanner.ps1 -RootPath "C:\" -TopCount 73 -FileTypes @(".mp4",".mkv",".avi")

# Scan Downloads folder for videos
.\file_scanner.ps1 -RootPath "$env:USERPROFILE\Downloads" -FileTypes @(".mp4",".mov")

# Quick scan of Pictures folder
.\file_scanner.ps1 -RootPath "$env:USERPROFILE\Pictures" -TopCount 20

📦 Requirements

View on GitHub →