Gemma 4 Goes Multimodal, a VSCode Token Heist, and the Real Cost of AI at Scale
From Google's encoder-free multimodal model to a sneaky GitHub token-stealing bug in VSCode, here's what developers and AI adopters should be paying attention to this week.
The pace of AI development rarely slows down, but this week's stories are a reminder that progress comes with trade-offs — new capabilities, new security risks, and increasingly real questions about cost and infrastructure.
🔵 Gemma 4 12B: Multimodal Without the Encoder
Google released Gemma 4 12B, a unified, encoder-free multimodal model that handles text and image inputs within a single architecture — no separate vision encoder required. This is a meaningful architectural shift: encoder-free designs are simpler to deploy, easier to fine-tune, and tend to have a smaller operational footprint. For developers building on-premise or resource-constrained multimodal applications, this is a model worth benchmarking immediately.
🔴 1-Click GitHub Token Theft via a VSCode Bug
Security researcher Ammar Askar disclosed a serious vulnerability in VSCode that allows an attacker to steal a user's GitHub authentication token with a single click — no complex social engineering required. If you're using VSCode with GitHub authentication (and most of us are), make sure you're running the latest patched version immediately. This is the kind of supply-chain-adjacent risk that doesn't get enough attention: developer tooling is a high-value target, and your GitHub token is essentially your identity.
🟡 Microsoft Launches MAI-Code-1-Flash
Microsoft introduced MAI-Code-1-Flash, a new coding-focused model designed for speed and efficiency in agentic and IDE-integrated workflows. Early impressions suggest it's optimized for low-latency code completion and generation tasks rather than pure benchmark performance. As coding assistants move deeper into CI/CD pipelines and autonomous agents, having a fast, reliable model purpose-built for code becomes increasingly important.
🟠 Uber's $1,500/Month AI Cap Is Telling You Something
Simon Willison wrote a sharp analysis of Uber's decision to cap AI tool spending at $1,500 per employee per month — and why that number is actually a useful signal for how companies are thinking about AI tool pricing and ROI. The framing is insightful: at that cap, companies are essentially asking, "what productivity gain justifies this cost?" It's a conversation every engineering team and CTO should be having right now, and it puts pressure on AI vendors to demonstrate concrete value rather than just seat-based licensing.
🟢 Use Your Nvidia GPU's VRAM as Linux Swap Space
A clever open-source project surfaced this week that lets you use your Nvidia GPU's VRAM as swap space on Linux via a network block device. For developers running large local models who constantly hit RAM limits, this is a creative workaround worth experimenting with — though it's best treated as a hack for dev environments rather than production systems.
My Take
The combination of better open models like Gemma 4, faster coding-specific models from Microsoft, and infrastructure tricks like VRAM-as-swap paints a clear picture: serious local AI inference is becoming genuinely accessible in 2026. Just don't forget to patch your VSCode while you're at it.

