RunPod Serverless Engineering
Migrated massive model weights out of ephemeral Docker layers into persistent volume mounts — serverless inference initialization dropped to under one second.

Muhammad Awais Khan — Systems-focused Cloud Platform & MLOps Engineer specializing in maximizing hardware efficiency, accelerating container orchestration, and eliminating serverless execution bottlenecks.
§ 01 · Signals
01 · RUNPOD · INFERENCE
80s → <1s
Serverless GenAI Cold Start
Eliminated cold starts on RunPod by migrating model weights from ephemeral Docker layers to persistent volume mounts.
02 · TEMPLIX · GPU
2× Throughput
Image-to-Video Pipeline
Production acceleration powering 50,000+ users of the Templix consumer app via custom kernel + memory tuning.
03 · DOCKER · CLOUD RUN
60% ↓
Container Footprint
Multi-stage Docker builds for an enterprise RAG engine shipped to GCP Cloud Run with rapid provisioning.
§ 02 · Production Track
Jul 2025 — Present
Owning the cloud + GPU substrate behind production GenAI products, including Templix — powering 50,000+ users with accelerated image-to-video pipelines.
Migrated massive model weights out of ephemeral Docker layers into persistent volume mounts — serverless inference initialization dropped to under one second.
Integrated custom Sage Attention kernels and vmtouch system caching to maximize active GPU memory bandwidth utilization.
Formulated high-volume async audio-to-text pipelines orchestrated through OpenAI Whisper frameworks.
§ 03 · Systems Dossier
Implemented CUDA kernels from scratch — shared memory tiling and coalesced global memory access patterns to optimize matrix multiply-accumulate workloads.
Continuous integration with built-in performance drift detection, automatically triggering cloud GPU retraining with rolling, zero-downtime updates.
End-to-end topology synchronizing parallel video, speech metrics, and transcript evaluation streams into semantic performance signals.
Optimized container builds for an enterprise RAG engine — minimized final image footprint by 60% for rapid cloud provisioning.
Predictive modeling pipeline across race & championship outcomes.
Operating-system level simulation in low-level assembly.
§ 04 · Tech Stack Map
01
02
03
04