Those are not in the current leaderboard but things has already changed since last quarter - [ ] Qwen3 Coder series - [ ] Qwen3 dense and MoE models - [ ] Qwen3 distilled models - [ ] GLM 4.5 both original and Air models - [ ] DeepSeek v3.1 base - [ ] DeepSeek-R1-0528 Other suggestions - [ ] Kimi K2 and Kimi Coder is probably worth it even if they fail on some tasks - [ ] GPT-OSS model series are good to comare as US representation - [ ] maybe LG's Exaone, Meta's Llama 4, and MiniMax if there is enough time for this
Those are not in the current leaderboard but things has already changed since last quarter
Other suggestions