Gemma 4 Goes Multimodal, a VSCode Token Heist, and the Real Cost of AI at Scale

The pace of AI development rarely slows down, but this week's stories are a reminder that progress comes with trade-offs — new capabilities, new security risks, and increasingly real questions about cost and infrastructure.

🔵 Gemma 4 12B: Multimodal Without the Encoder

Google released Gemma 4 12B, a unified, encoder-free multimodal model that handles text and image inputs within a single architecture — no separate vision encoder required. This is a meaningful architectural shift: encoder-free designs are simpler to deploy, easier to fine-tune, and tend to have a smaller operational footprint. For developers building on-premise or resource-constrained multimodal applications, this is a model worth benchmarking immediately.

→ Read the announcement

🔴 1-Click GitHub Token Theft via a VSCode Bug

Security researcher Ammar Askar disclosed a serious vulnerability in VSCode that allows an attacker to steal a user's GitHub authentication token with a single click — no complex social engineering required. If you're using VSCode with GitHub authentication (and most of us are), make sure you're running the latest patched version immediately. This is the kind of supply-chain-adjacent risk that doesn't get enough attention: developer tooling is a high-value target, and your GitHub token is essentially your identity.

→ Read the full disclosure

🟡 Microsoft Launches MAI-Code-1-Flash

Microsoft introduced MAI-Code-1-Flash, a new coding-focused model designed for speed and efficiency in agentic and IDE-integrated workflows. Early impressions suggest it's optimized for low-latency code completion and generation tasks rather than pure benchmark performance. As coding assistants move deeper into CI/CD pipelines and autonomous agents, having a fast, reliable model purpose-built for code becomes increasingly important.

→ Learn more

🟠 Uber's $1,500/Month AI Cap Is Telling You Something

Simon Willison wrote a sharp analysis of Uber's decision to cap AI tool spending at $1,500 per employee per month — and why that number is actually a useful signal for how companies are thinking about AI tool pricing and ROI. The framing is insightful: at that cap, companies are essentially asking, "what productivity gain justifies this cost?" It's a conversation every engineering team and CTO should be having right now, and it puts pressure on AI vendors to demonstrate concrete value rather than just seat-based licensing.

→ Read Simon's take

🟢 Use Your Nvidia GPU's VRAM as Linux Swap Space

A clever open-source project surfaced this week that lets you use your Nvidia GPU's VRAM as swap space on Linux via a network block device. For developers running large local models who constantly hit RAM limits, this is a creative workaround worth experimenting with — though it's best treated as a hack for dev environments rather than production systems.

→ Check out the repo

My Take

The combination of better open models like Gemma 4, faster coding-specific models from Microsoft, and infrastructure tricks like VRAM-as-swap paints a clear picture: serious local AI inference is becoming genuinely accessible in 2026. Just don't forget to patch your VSCode while you're at it.

🔵 Gemma 4 12B: Multimodal Without the Encoder

🔴 1-Click GitHub Token Theft via a VSCode Bug

🟡 Microsoft Launches MAI-Code-1-Flash

🟠 Uber's $1,500/Month AI Cap Is Telling You Something

🟢 Use Your Nvidia GPU's VRAM as Linux Swap Space