Blog

Essays, experiments, and working notes. Mostly ML engineering and the infrastructure behind it.

April 17, 2026 · mlops · vllm · gemma · gpu · runpod · llm-deployment

Deploying Gemma 4 26B A4B on an RTX 5090

Notes on standing up a private Gemma 4 26B A4B inference endpoint on an RTX 5090 with vLLM — the dead ends, the working setup, and the reasoning behind each decision.

Read

April 16, 2026 · meta

Hello, world

A first post — why I started writing, what you can expect here, and what I actually build when nobody is watching.

Read