current-model-landscape — Petter's wiki

The current model landscape is a fast-moving mix of open-source and open-weight LLMs, multimodal releases, and production-serving improvements. The useful question is no longer whether open models are “good enough” in the abstract, but which model family, context window, license, and inference stack fit the workload and deployment constraints.

What the landscape looks like now

Open-source and open-weight models now cover reasoning, coding, multimodal input, and agentic workflows
The practical decision is a fit question: privacy, cost, control, latency, context length, and modality
Proprietary frontier models still matter, but the open stack is no longer a toy alternative
The model market changes fast enough that the best choice this quarter may be stale by the next release wave

The current release wave

GLM-5.1 is positioned as a long-horizon agentic model that can stay productive across hundreds of rounds and thousands of tool calls
Qwen3.6-27B claims flagship-level coding power at a dense 27B size, showing that smaller dense models are still climbing
Gemma 4 pushes open-weight reasoning, coding, and multimodal work across several sizes, including on-device options
DeepSeek-V3.2 and the newer DeepSeek-V4 preview emphasize long context and efficient reasoning at very large scale
Kimi-K2.5 shows how early vision fusion and long context can turn a multimodal model into a serious agentic system
MiniMax-M2.7 and MiMo-V2-Flash show that agentic and coding-heavy workloads are increasingly being targeted by open models with different parameter-efficiency tradeoffs

How to choose a stack

Self-hosting matters when privacy, cost control, data residency, or long-term vendor independence is part of the requirement
Fine-tuning smaller models on proprietary data is often the highest-leverage way to get domain fit
A flexible inference layer matters as much as the model itself because the model frontier moves quickly
Inference optimizations such as batching, speculative decoding, and prefill/decode disaggregation turn model quality into something a product can actually afford to serve
The best model for reasoning is not necessarily the best model for coding, retrieval, multimodal UI work, or on-device operation

Limits and counterarguments

Yann LeCun’s lecture, as circulated in the raw feed, argues that the next trillion-dollar AI company will not be built on LLMs alone and that scaling is not the full answer
That does not negate the current open-model wave, but it does warn against assuming the present architecture race is the final form of AI progress
Some raw posts in the feed were low-signal teasers or bare links, which is itself a reminder that release noise is now part of the model market