Prompt InsightsOpen Prompt Builder

Models

GLM-5.2 Becomes the New Open-Weights Leader, Beating GPT-5.5 on Agentic Knowledge Work

Z.ai's GLM-5.2, a 753B-parameter MoE under the MIT license, is now the top open-weights model on the Artificial Analysis Intelligence Index and scored above GPT-5.5 on the new AA-Briefcase agentic eval. The frontier-grade option you can self-host just shifted.

2 min read
Photo: Unsplash

Z.ai has released GLM-5.2, now the leading open-weights model on the Artificial Analysis Intelligence Index. It is a roughly 753B-parameter Mixture-of-Experts model (about 40B active) shipped under the permissive MIT license, and on Artificial Analysis' new AA-Briefcase agentic knowledge-work eval it scored above GPT-5.5. Z.ai positions it as the most powerful text-only open-weights LLM, built for long-horizon tasks, and it tops its cohort on the new agentic benchmark (alongside Claude Fable in a separate cohort).

Why it matters

For the first time, the strongest open-weights model is not just competitive on static benchmarks but ahead of a closed frontier model on real agentic knowledge work. As Simon Willison notes, GLM-5.2 leads the Intelligence Index among open models while costing a fraction of GPT-5.5 on hosted endpoints. The MIT license removes the usual asterisks: no regional limits, full commercial and research use, and the freedom to self-host. If you have been waiting for an open model you can actually deploy for serious agent workloads, this is it. It is the clearest signal yet that open weights are now a frontier story, not a budget alternative.

The best agentic model you can download and own now beats a leading closed model on knowledge work.

What changes in practice

  • Self-hosting is viable for top-tier agents. A 1M-token context and MoE efficiency mean long-horizon trajectories run without a closed API in the loop.
  • Cost math flips. Hosted GLM-5.2 runs near $1.40 in / $4.40 out per million tokens, versus roughly $5 / $30 for GPT-5.5.
  • Token budgets grow. GLM-5.2 consumes about 43k output tokens per Intelligence Index task, well above leaner models, so reasoning depth is not free.
  • Text-only is fine for code. No image input, yet it ranks second on Code Arena WebDev behind only Claude Fable, much like the text-first tradeoff seen in Gemma 4 12B discussions.

How to use it

  1. Pull the weights from Hugging Face or ModelScope and serve with vLLM, SGLang, or transformers for full control.
  2. Start hosted via OpenRouter or Z.ai to benchmark on your own tasks before committing infra.
  3. Tune reasoning effort. Use High or Max levels for long-horizon agent runs, and dial down for cheap routine calls to control that 43k-token tail.
  4. Wire it into existing agent harnesses (Claude Code, ZCode, OpenCode) since it slots into standard tool-calling loops.
  5. Watch your output token meter in production: depth is the point, but it is also the bill.

The open-weights frontier just caught up, and it fits on your own hardware.

READY TO ASCEND

Get AI news that respects your time

The signal, distilled. Curated AI news and prompt-engineering insight. No noise.

More in Models