Blog
Essays, experiments, and working notes. Mostly ML engineering and the infrastructure behind it.
· mlops · vllm · gemma · gpu · runpod · llm-deployment
Deploying Gemma 4 26B A4B on an RTX 5090
Notes on standing up a private Gemma 4 26B A4B inference endpoint on an RTX 5090 with vLLM — the dead ends, the working setup, and the reasoning behind each decision.
Read · meta
Hello, world
A first post — why I started writing, what you can expect here, and what I actually build when nobody is watching.
Read