Code quality and tool-calling reliability. 272 models tested. Updated daily.
How we test → 10 tasks · scoring · methodology · Add your model →
Each dot is a model. Color = score (green good, red bad). Size = relative model size.
Hover/tap a dot for details. X-axis is log scale.
Filter by type, quantization, or name. Click column to sort. Updated: 2026-06-17T16:25:38
| Model (flag = region) | Type | Score ↓ | P/P/F | Cost |
|---|---|---|---|---|
| 🇺🇸SmolLM3 3B (4-bit 1.8GB) | local | 93 | 9/0/1 | local |
| 🇺🇸Nemotron 3 Nano 30B A3B | cloud | 90 | 8/2/0 | $1.00/M |
| 🇪🇺Codestral 2508 | cloud | 90 | 8/2/0 | $1.00/M |
| 🇨🇳MiniMax M2 Her | cloud | 90 | 8/2/0 | $1.20/M |
| 🇨🇳DeepSeek Chat | cloud | 90 | 8/2/0 | $1.00/M |
| 🇨🇳Qwen3 Coder 30B A3B | cloud | 90 | 8/2/0 | $1.00/M |
| 🇪🇺Mistral Large 2411 | cloud | 90 | 8/2/0 | $1.00/M |
| 🇨🇳DeepSeek Chat V3-0324 | cloud | 90 | 8/2/0 | $1.00/M |
| 🇺🇸Amazon Nova 2 Lite | cloud | 90 | 8/2/0 | $1.00/M |
| 🇺🇸Granite 4.0 Micro | cloud | 90 | 8/2/0 | $1.00/M |
| 🇨🇳Tencent Hunyuan A13B | cloud | 90 | 8/2/0 | $1.00/M |
| 🇪🇺Ministral 3 3B 2512 | cloud | 90 | 8/2/0 | $1.00/M |
| 🇺🇸GPT-OSS 20B (free) | cloud | 90 | 8/2/0 | $1.00/M |
| 🇨🇳Qwen3.5 9B 5-bit (Q5_K_M 6.1GB) | local | 90 | 9/0/1 | local |
| 🇺🇸Phi-4-mini (4-bit 2.3GB) | local | 90 | 9/0/1 | local |
| 🇺🇸IBM Granite 4.1 8B | cloud | 90 | 9/0/1 | $1.00/M |
| 🇪🇺Falcon3-7B-Instruct-4bit (4-bit 3.8GB) | local | 88 | 8/1/1 | local |
| 🇺🇸Claude Sonnet 4 | cloud | 85 | 7/3/0 | $6.60/M |
| 🇨🇳Qwen2.5 1.5B (4-bit 0.9GB) | local | 85 | 7/3/0 | local |
| 🇨🇳Qwen2.5 3B (4-bit 1.8GB) | local | 85 | 7/3/0 | local |
| 🌍Seed 2.0 Mini | cloud | 85 | 7/3/0 | free |
| 🇨🇳MiMo V2 Flash | cloud | 85 | 7/3/0 | free |
| 🇺🇸Gemini 2.5 Flash Lite | cloud | 85 | 7/3/0 | $0.40/M |
| 🇨🇳Qwen3 Coder Flash | cloud | 85 | 7/3/0 | $1.50/M |
| 🌍Cogito V2.1 671B | cloud | 85 | 7/3/0 | $1.50/M |
| 🇺🇸Claude Opus 4.6 | cloud | 85 | 7/3/0 | free |
| 🇨🇳Qwen3 Max | cloud | 85 | 7/3/0 | $1.00/M |
| 🌍EssentialAI RNJ-1 | cloud | 85 | 7/3/0 | $0.15/M |
| 🇺🇸Amazon Nova Micro v1 | cloud | 85 | 7/3/0 | $0.14/M |
| 🇪🇺Mistral Small 3.2 | cloud | 85 | 7/3/0 | $1.00/M |
| 🇺🇸LFM 2 24B A2B | cloud | 85 | 7/3/0 | $1.00/M |
| 🇨🇳Qwen3.7 Max | cloud | 85 | 7/3/0 | $1.00/M |
| 🇨🇳Qwen3.5 Plus (2026-04-20) | cloud | 85 | 7/3/0 | $1.00/M |
| 🇪🇺Mistral Medium 3.5 | cloud | 85 | 7/3/0 | $1.00/M |
| 🇪🇺Ministral 8B | cloud | 85 | 7/3/0 | $1.00/M |
| 🇨🇳DeepSeek V3.1 Terminus | cloud | 85 | 7/3/0 | $1.00/M |
| 🇺🇸Gemma 3N E4B | cloud | 85 | 7/3/0 | $1.00/M |
| 🇨🇳Qwen3 Max Thinking | cloud | 85 | 7/3/0 | $1.00/M |
| 🇺🇸GPT-5.1 Chat | cloud | 85 | 7/3/0 | $1.00/M |
| 🇺🇸LFM2.5 1.2B Instruct (free) | cloud | 85 | 7/3/0 | $1.00/M |
| 🇺🇸Phi-4 | cloud | 85 | 7/3/0 | $1.00/M |
| 🇨🇳MiniMax-01 | cloud | 85 | 7/3/0 | $1.00/M |
| 🇺🇸GPT-OSS 120B (free) | cloud | 85 | 7/3/0 | $1.00/M |
| 🇪🇺Mistral Small 24B 2501 | cloud | 85 | 7/3/0 | $1.00/M |
| 🇨🇳Qwen Plus | cloud | 85 | 8/1/1 | free |
| 🌍L3.1 Euryale 70B | cloud | 85 | 8/1/1 | $1.00/M |
| 🌍L3 Lunaris 8B | cloud | 85 | 8/1/1 | $1.00/M |
| 🌍Anthracite Magnum V4 72B | cloud | 85 | 8/1/1 | $1.00/M |
| 🇺🇸Gemma 3 27B | cloud | 85 | 8/1/1 | $1.00/M |
| 🌍Skyfall 36B v2 | cloud | 85 | 8/1/1 | $1.00/M |
| 🇪🇺Mistral Devstral 2 | cloud | 85 | 8/1/1 | $1.30/M |
| 🇺🇸Granite 3.2 2B (4-bit 1.5GB) | local | 83 | 7/2/1 | local |
| 🇪🇺Ministral 3B (4-bit 2.0GB) | local | 82 | 8/1/1 | local |
| 🌍AionLabs: Aion-2.0 | cloud | 82 | 7/1/2 | $1.52/M |
| 🇺🇸GPT-4.1 | cloud | 82 | 8/1/1 | $7.60/M |
| 🌍Kimi K2.5 | cloud | 80 | 6/4/0 | free |
| 🇺🇸Claude Sonnet 4.6 | cloud | 80 | 6/4/0 | $15.00/M |
| 🇺🇸Gemini 2.5 Pro | cloud | 80 | 6/4/0 | $10.00/M |
| 🌍Kat Coder Pro V2 | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳Qwen3 Coder | cloud | 80 | 6/4/0 | $1.80/M |
| 🇺🇸Claude Opus 4.6 Fast | cloud | 80 | 6/4/0 | free |
| 🇪🇺Ministral 14B 2512 | cloud | 80 | 6/4/0 | $1.00/M |
| 🌍Kimi K2 0905 | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳Qwen3 30B A3B 2507 | cloud | 80 | 6/4/0 | $1.00/M |
| 🌍Nex-AGI N1 | cloud | 80 | 6/4/0 | $0.50/M |
| 🇨🇳DeepSeek V3.2 | cloud | 80 | 6/4/0 | $1.00/M |
| 🌍Cydonia 24B V4.1 | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳Qwen3 8B | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳Qwen3.6 Flash | cloud | 80 | 6/4/0 | $1.00/M |
| 🇺🇸Nemotron 3 Super (free) | cloud | 80 | 6/4/0 | $1.00/M |
| 🌍Owl Alpha | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳DeepSeek V3.2 Exp | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳Qwen 3.7 Plus | cloud | 80 | 6/4/0 | $1.00/M |
| 🇨🇳Qwen 3.5 Flash (v2) | cloud | 80 | 6/4/0 | $1.00/M |
| 🇯🇵🇰🇷Solar Pro v3 | cloud | 80 | 6/4/0 | $1.00/M |
| 🇪🇺Mistral Medium 3.1 | cloud | 80 | 6/4/0 | $1.00/M |
| 🇺🇸Amazon Nova Premier | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸Claude Sonnet 4.5 | cloud | 80 | 7/2/1 | $15.00/M |
| 🇺🇸Claude Opus 4.7 Fast | cloud | 80 | 7/2/1 | $150.00/M |
| 🇺🇸Amazon Nova Pro v1 | cloud | 80 | 7/2/1 | free |
| 🌍Seed 1.6 Flash | cloud | 80 | 7/2/1 | $0.30/M |
| 🌍Seed 1.6 | cloud | 80 | 7/2/1 | $1.00/M |
| 🇨🇳Qwen3 235B A22B 2507 | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸Cohere Command A | cloud | 80 | 7/2/1 | $10.00/M |
| 🇺🇸Claude Haiku 4.5 | cloud | 80 | 7/2/1 | $25.00/M |
| 🇨🇳MiniMax M2*[30] | cloud | 80 | 7/2/1 | $2.00/M |
| 🇺🇸Claude Opus 4.1 | cloud | 80 | 7/2/1 | $25.00/M |
| 🇺🇸Claude Opus 4 | cloud | 80 | 7/2/1 | $1.00/M |
| 🌍Mancer Weaver | cloud | 80 | 7/2/1 | $1.00/M |
| 🇨🇳Qwen Plus (2025-07-28) | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸GPT-5.4 Nano | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸Claude Haiku Latest | cloud | 80 | 7/2/1 | $1.00/M |
| 🌍Perceptron Mk1 | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸GPT-5.2 Chat | cloud | 80 | 7/2/1 | $1.00/M |
| 🌍Kimi K2 Thinking | cloud | 80 | 7/2/1 | $1.00/M |
| 🌍Voxtral Small 24B | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸GPT-OSS-120B | cloud | 80 | 7/2/1 | $1.00/M |
| 🇪🇺Mistral Medium 3 | cloud | 80 | 7/2/1 | $1.00/M |
| 🇺🇸Gemma 3 12B IT | cloud | 80 | 7/2/1 | $1.00/M |
| 🇪🇺Mistral Saba | cloud | 80 | 7/2/1 | $1.00/M |
| 🇪🇺Falcon3-Mamba-7B-4bits (4-bit 3.7GB) | local | 80 | 8/0/2 | local |
| 🇺🇸Grok Code Fast 1 | cloud | 80 | 8/0/2 | free |
| 🌍Seed 2.0 Lite | cloud | 80 | 8/0/2 | free |
| 🇺🇸Llama 4 Scout | cloud | 80 | 8/0/2 | $0.30/M |
| 🌍Aion 1.0 | cloud | 80 | 8/0/2 | $1.00/M |
| 🇺🇸Anthropic: Claude Opus 4.8 (Fast) | cloud | 80 | 8/1/1 | $31.00/M |
| 🇪🇺Mistral Large 3 | cloud | 80 | 7/2/1 | $0.80/M |
| 🇪🇺Falcon3-10B-Instruct-4bit (4-bit 5.3GB) | local | 79 | 7/0/1 | local |
| 🇪🇺Falcon3-3B-Instruct-4bit (4-bit 1.7GB) | local | 79 | 7/0/1 | local |
| 🇺🇸Gemma 4 26B A4B*[2] | cloud | 78 | 6/4/0 | $0.14/M |
| 🇺🇸Gemma 4 31B*[1] | cloud | 78 | 6/3/1 | $0.20/M |
| 🌍Palmyra X5 | cloud | 78 | 7/2/1 | $1.00/M |
| 🇺🇸Anthropic: Claude Opus 4.8 | cloud | 78 | 6/3/1 | $15.50/M |
| 🇺🇸Anthropic: Claude Opus Latest | cloud | 78 | 6/3/1 | $15.50/M |
| 🇨🇳Qwen3 8B 5-bit (Q5_K_M 5.5GB) | local | 78 | 7/1/2 | local |
| 🇺🇸Nemotron 3 Super | cloud | 78 | 7/3/0 | $0.90/M |
| 🇺🇸GPT-4.1 Mini | cloud | 78 | 7/3/0 | $0.60/M |
| 🇺🇸Llama 4 Maverick | cloud | 78 | 7/2/1 | $0.80/M |
| 🇺🇸Claude Opus 4.7 | cloud | 78 | 7/2/1 | $15.00/M |
| 🇨🇳Qwen: Qwen3.5 Plus 2026-02-15 | cloud | 78 | 7/2/1 | $1.54/M |
| 🇨🇳Qwen3 Coder Plus | cloud | 78 | 7/2/1 | $1.50/M |
| 🇺🇸Grok 3 Mini | cloud | 78 | 7/3/0 | $0.60/M |
| 🇺🇸inclusionAI Ling 2.6 | cloud | 77 | 4/6/0 | $1.30/M |
| 🇪🇺Mistral Nemo 12B 4-bit (4-bit 6.3GB) | local | 77 | 7/2/1 | local |
| 🇺🇸Google: Gemma 4 31B (free) | cloud | 77 | 6/3/1 | free |
| 🇺🇸Gemma 3n 2B (4-bit 2.6GB) | local | 77 | 7/1/2 | local |
| 🇨🇳Qwen 3.6 Plus*[3] | cloud | 77 | 6/4/0 | $0.81/M |
| 🇺🇸GPT-5.4 | cloud | 77 | 6/3/1 | $6.25/M |
| 🇺🇸GPT-4.1 Nano | cloud | 77 | 7/2/1 | $1.00/M |
| 🇺🇸OpenAI GPT Mini Latest | cloud | 77 | 6/3/1 | $2.69/M |
| 🇺🇸Anthropic Claude Sonnet Latest | cloud | 77 | 6/3/1 | $9.72/M |
| 🇺🇸Anthropic: Claude Opus 4.7 | cloud | 77 | 7/1/2 | $15.80/M |
| 🇺🇸Gemini 2.5 Flash*[4] | cloud | 76 | 5/5/0 | $0.96/M |
| 🌍Kimi K2.6*[5] | cloud | 75 | 5/5/0 | $1.57/M |
| 🇺🇸Ling 2.6 Flash | cloud | 75 | 5/5/0 | free |
| 🇨🇳Qwen3 Coder Next | cloud | 75 | 5/5/0 | $1.00/M |
| 🇨🇳Qwen3 Next 80B A3B | cloud | 75 | 5/5/0 | $1.00/M |
| 🇺🇸Claude Opus 4.5 | cloud | 75 | 5/5/0 | $25.00/M |
| 🇺🇸Grok 4.3 | cloud | 75 | 5/5/0 | $1.00/M |
| 🇺🇸Grok Build 0.1 | cloud | 75 | 5/5/0 | $1.00/M |
| 🇺🇸Grok 4.20 | cloud | 75 | 6/3/1 | $1.63/M |
| 🇺🇸Gemini 3 Flash | cloud | 75 | 6/3/1 | $1.00/M |
| 🇺🇸Gemini 2.0 Flash Lite | cloud | 75 | 6/3/1 | $0.30/M |
| 🇺🇸GPT-5.1 | cloud | 75 | 6/3/1 | $10.00/M |
| 🇪🇺Mistral Small 2603 | cloud | 75 | 6/3/1 | $0.60/M |
| 🇨🇳Devstral Medium | cloud | 75 | 6/3/1 | $2.00/M |
| 🌍Remm Slerp L2 13B | cloud | 75 | 6/3/1 | $1.00/M |
| 🌍GPT Chat Latest | cloud | 75 | 6/3/1 | $1.00/M |
| 🇺🇸Phi-4 Mini | cloud | 75 | 6/3/1 | $1.00/M |
| 🇺🇸GPT-5 Chat | cloud | 75 | 6/3/1 | $1.00/M |
| 🇺🇸Hermes 4 70B | cloud | 75 | 6/3/1 | $1.00/M |
| 🇯🇵🇰🇷Solar Pro 3 | cloud | 75 | 6/3/1 | $1.00/M |
| 🌍Kimi K2 | cloud | 75 | 6/3/1 | $1.00/M |
| 🇨🇳GLM 4 32B | cloud | 75 | 6/3/1 | $1.00/M |
| 🇺🇸Nemotron 3 Ultra | cloud | 75 | 6/3/1 | $1.00/M |
| 🇨🇳Qwen3 235B A22B | cloud | 75 | 6/3/1 | $1.00/M |
| 🌍Morph V3 Large | cloud | 75 | 7/1/2 | free |
| 🌍Jamba Large 1.7 | cloud | 75 | 7/1/2 | $3.00/M |
| 🌍Mercury 2 | cloud | 75 | 7/1/2 | $1.00/M |
| 🇺🇸GPT-5.2*[19] | cloud | 75 | 7/1/2 | free |
| 🇨🇳Devstral Small | cloud | 75 | 7/1/2 | $0.30/M |
| 🌍Morph V3 Fast | cloud | 75 | 7/1/2 | $1.20/M |
| 🌍Inflection 3 Productivity | cloud | 75 | 7/1/2 | $1.00/M |
| 🇺🇸Command R7B (Cohere) | cloud | 75 | 7/1/2 | $1.00/M |
| 🇺🇸Grok 4 Fast | cloud | 75 | 6/3/1 | $0.55/M |
| 🇺🇸Grok 4.1 Fast | cloud | 75 | 6/2/2 | $0.29/M |
| 🇺🇸OpenAI: GPT-5.3 Chat | cloud | 75 | 6/2/2 | $10.97/M |
| 🇺🇸Gemini 3.1 Flash Lite | cloud | 75 | 5/5/0 | free |
| 🇺🇸Grok 4.20 Multi-Agent | cloud | 74 | 5/5/0 | $15.00/M |
| 🇨🇳Qwen2.5 0.5B (4-bit 0.4GB) | local | 74 | 7/1/2 | local |
| 🇺🇸OpenAI: GPT-5.4 Mini | cloud | 73 | 6/3/1 | $2.68/M |
| 🇺🇸OpenAI: GPT-5.4 Image 2 | cloud | 73 | 6/2/2 | $9.03/M |
| 🇺🇸Llama 3.2 1B (4-bit 0.8GB) | local | 73 | 6/1/3 | local |
| 🇺🇸SmolLM2 1.7B (4-bit 1.0GB) | local | 71 | 6/0/4 | local |
| 🇺🇸Gemini 2.0 Flash | cloud | 70 | 5/4/1 | $0.40/M |
| 🇺🇸Hermes 4 405B | cloud | 70 | 5/4/1 | $1.00/M |
| 🇨🇳Qwen Plus 0728 (thinking) | cloud | 70 | 5/4/1 | $1.00/M |
| 🇺🇸GPT-5.2 Codex*[18] | cloud | 70 | 6/2/2 | free |
| 🇺🇸GPT-5.1 Codex | cloud | 70 | 6/2/2 | $10.00/M |
| 🇨🇳DeepSeek Chat V3.1*[33] | cloud | 70 | 6/2/2 | $0.79/M |
| 🇨🇳Qwen3 30B A3B Thinking 2507 | cloud | 70 | 6/2/2 | $1.00/M |
| 🇨🇳MiniMax M2.7*[6] | cloud | 70 | 7/1/2 | $0.50/M |
| 🇨🇳DeepSeek R1 0528 | cloud | 68 | 6/2/2 | $8.00/M |
| 🇨🇳Xiaomi MiMo V2.5 Pro | cloud | 68 | 7/0/3 | $1.60/M |
| 🇨🇳MiniMax M2.5 | cloud | 65 | 5/3/2 | free |
| 🇺🇸GPT-5.1 Codex Mini*[20] | cloud | 65 | 5/3/2 | free |
| 🇨🇳GLM 4.6 | cloud | 65 | 5/3/2 | $1.00/M |
| 🇨🇳Qwen3 Next 80B | cloud | 65 | 5/3/2 | $1.00/M |
| 🇺🇸inclusionAI Ring 2.6 | cloud | 65 | 6/1/3 | free |
| 🇺🇸Nemotron 3 Nano 30B (free) | cloud | 65 | 6/1/3 | $1.00/M |
| 🇺🇸Gemma 4 E4B 5-bit*[15] (5-bit 5.1GB) | local | 64 | 6/1/3 | local |
| 🇪🇺Falcon3-1B-Instruct-4bit (4-bit 0.9GB) | local | 62 | 6/0/2 | local |
| 🇺🇸GPT-5.5*[7] | cloud | 60 | 5/2/3 | $12.50/M |
| 🇨🇳DeepSeek V4 Flash | cloud | 60 | 4/3/3 | $0.18/M |
| 🇨🇳MiniMax M2.1*[29] | cloud | 60 | 5/2/3 | $2.00/M |
| 🇺🇸OpenAI GPT Latest | cloud | 60 | 5/2/3 | $21.08/M |
| 🇪🇺Mistral Small 3.1 24B | cloud | 60 | 5/2/3 | $1.00/M |
| 🌍Reka Edge | cloud | 60 | 5/2/3 | $0.10/M |
| 🇺🇸GPT-5.3 Codex*[17] | cloud | 55 | 5/1/4 | free |
| 🇺🇸GPT-OSS-20B | cloud | 55 | 5/1/4 | $1.00/M |
| 🇺🇸GPT-5.4 Pro | cloud | 52 | 5/1/4 | $75.00/M |
| 🌍Ring 2.6 1T*[26] | cloud | 50 | 4/2/4 | $0.62/M |
| 🇺🇸Gemma 4 26B A4B (free) | cloud | 50 | 4/2/4 | $1.00/M |
| 🌍Laguna XS.2 (free) | cloud | 50 | 4/2/4 | $1.00/M |
| 🇨🇳GLM 4.5 Air | cloud | 50 | 5/0/5 | $1.00/M |
| 🌍DeltaCoder 9B 5-bit*[16] (Q5_K_M 6.1GB) | local | 47 | 4/1/5 | local |
| 🌍Kimi K2.7 Code | cloud | 45 | 4/1/5 | $1.00/M |
| 🇺🇸GPT-5.5 Pro*[8] | cloud | 43 | 4/1/5 | $75.00/M |
| 🇨🇳Xiaomi: MiMo-V2-Omni | cloud | 42 | 4/1/5 | $1.22/M |
| 🇨🇳Qwen: Qwen3.5 397B A17B | cloud | 40 | 2/3/5 | $2.85/M |
| 🇨🇳Qwen 3 32B | cloud | 40 | 3/2/5 | $1.00/M |
| 🇨🇳MiniMax M3 | cloud | 40 | 4/0/6 | $1.00/M |
| 🌍Nex-N2-Pro (free) | cloud | 40 | 4/0/6 | $1.00/M |
| 🇨🇳Qwen: Qwen3.5-27B | cloud | 38 | 2/3/5 | $1.68/M |
| 🇺🇸Grok 4 | cloud | 38 | 3/2/5 | $1.00/M |
| 🇨🇳DeepSeek V4 Pro | cloud | 38 | 4/0/6 | $0.57/M |
| 🇨🇳MiniMax M1 | cloud | 35 | 2/3/5 | free |
| 🇺🇸Google Gemini Flash Latest | cloud | 33 | 0/3/7 | $8.03/M |
| 🌍MoonshotAI: Kimi K2.6 (free) | cloud | 33 | 3/1/6 | free |
| 🇨🇳Qwen 3.6 35B A3B*[27] | cloud | 30 | 2/2/6 | $1.00/M |
| 🇨🇳MiMo-V2.5 | cloud | 30 | 2/2/6 | $1.00/M |
| 🇺🇸GPT-5 Codex*[23] | cloud | 30 | 3/0/7 | $10.00/M |
| 🌍MoonshotAI Kimi Latest | cloud | 30 | 3/0/7 | $3.69/M |
| 🇺🇸Gemini 3 Pro Image*[36] | cloud | 30 | 3/0/7 | $1.00/M |
| 🇨🇳GLM 4.5 | cloud | 30 | 3/0/7 | $1.00/M |
| 🇨🇳DeepSeek-R1 1.5B (4-bit 1.0GB) | local | 28 | 2/1/7 | local |
| 🇺🇸Google Gemini Pro Latest | cloud | 27 | 0/2/8 | $10.70/M |
| 🇨🇳Qwen3.5 0.8B (4-bit 0.5GB) | local | 26 | 2/1/7 | local |
| 🇨🇳Tencent HY3 | cloud | 25 | 2/1/7 | $1.00/M |
| 🇺🇸Gemini 3.5 Flash*[28] | cloud | 25 | 2/1/7 | $9.00/M |
| 🇨🇳Qwen 3 30B A3B | cloud | 25 | 2/1/7 | $1.00/M |
| 🇨🇳Xiaomi MiMo V2 Pro*[12] | cloud | 24 | 2/1/7 | $2.50/M |
| 🇨🇳Qwen: Qwen3.5-122B-A10B | cloud | 22 | 1/2/7 | $2.43/M |
| 🌍StepFun 3.5 Flash | cloud | 20 | 2/0/8 | $0.60/M |
| 🌍Pareto Code Router | cloud | 18 | 1/1/8 | $1.37/M |
| 🇨🇳GLM 5.1 | cloud | 15 | 1/1/8 | free |
| 🇨🇳GLM 4.7*[32] | cloud | 15 | 1/1/8 | $2.00/M |
| 🇨🇳Qwen: Qwen3.6 27B | cloud | 13 | 0/2/8 | $2.77/M |
| 🇺🇸GPT-5.1 Codex Max*[22] | cloud | 10 | 0/2/8 | $10.00/M |
| 🇨🇳DeepSeek V3.2 Speciale*[11] | cloud | 10 | 1/0/9 | $1.50/M |
| 🌍Intellect 3 | cloud | 10 | 1/0/9 | $2.00/M |
| 🇺🇸Nemotron 3 Nano Omni (free) | cloud | 10 | 1/0/9 | $1.00/M |
| 🌍StepFun: Step 3.7 Flash | cloud | 10 | 1/0/9 | $0.97/M |
| 🇨🇳Z.ai: GLM 5 Turbo | cloud | 10 | 1/0/9 | $3.61/M |
| 🇨🇳GLM-5 | cloud | 10 | 1/0/9 | $1.00/M |
| 🇺🇸LFM2.5 1.2B Thinking (free) | cloud | 10 | 1/0/9 | $1.00/M |
| 🌍Perplexity Sonar Reasoning Pro | cloud | 10 | 1/0/9 | $1.00/M |
| 🇺🇸OpenAI o3 | cloud | 10 | 1/0/9 | $1.00/M |
| 🇺🇸GPT-5 Mini*[25] | cloud | 5 | 0/1/9 | $2.00/M |
| 🇨🇳Qwen 3.5 35B MoE*[34] | cloud | 5 | 0/1/9 | $1.00/M |
| 🇨🇳Qwen3 235B A22B Thinking 2507 | cloud | 5 | 0/1/9 | $1.00/M |
| 🌍OLMo 3 32B Think | cloud | 0 | 0/0/10 | $1.50/M |
| 🌍Reka Flash 3 | cloud | 0 | 0/0/10 | $1.00/M |
| 🇺🇸GPT-5 Nano*[21] | cloud | 0 | 0/0/10 | free |
| 🇺🇸GPT-5*[24] | cloud | 0 | 0/0/10 | $10.00/M |
| 🇨🇳GLM 4.7 Flash*[31] | cloud | 0 | 0/0/10 | $2.00/M |
| 🇨🇳Qwen3 14B | cloud | 0 | 0/0/10 | $1.00/M |
| 🌍Trinity Large Thinking | cloud | 0 | 0/0/10 | $1.00/M |
| 🇨🇳GLM 5V Turbo | cloud | 0 | 0/0/10 | $1.00/M |
| 🇨🇳DeepSeek V4 Flash (free) | cloud | 0 | 0/0/10 | $1.00/M |
| 🌍Laguna M.1 (free) | cloud | 0 | 0/0/10 | $1.00/M |
| 🇨🇳Qwen 3.5 9B | cloud | 0 | 0/0/10 | $1.00/M |
| 🌍Arcee Trinity Mini | cloud | 0 | 0/0/10 | $1.00/M |
| 🇺🇸Nemotron Nano 9B v2 | cloud | 0 | 0/0/10 | $1.00/M |
| 🇺🇸Nemotron Super 49B*[35] | cloud | 0 | 0/0/10 | $1.00/M |
| 🌍Arcee Coder Large | cloud | 0 | 0/0/10 | $1.00/M |
| 🇺🇸Dolphin Mistral 24B (free) | cloud | 0 | 0/0/10 | $1.00/M |
| 🌍Arcee Virtuoso Large | cloud | 0 | 0/0/10 | $1.00/M |
| 🇨🇳Qwen3 Next 80B (free) | cloud | 0 | 0/0/10 | $1.00/M |
| 🇨🇳DeepSeek R1 Distill Qwen 32B | cloud | 0 | 0/0/10 | $1.00/M |
| 🇺🇸OpenAI o4 Mini High | cloud | 0 | 0/0/10 | $1.00/M |
| 🇨🇳Qwen3 Coder 480B (free) | cloud | 0 | 0/0/10 | $1.00/M |
Tool-calling reliability — can the model actually function as an agent? 6 tests: single tool, multi-tool, required mode, false positive avoidance, multi-turn chaining, argument correctness.
| Model | Score | Single | Multi | Required | No FP | Chain | Args |
|---|---|---|---|---|---|---|---|
| SmolLM3-3B | 50% | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Phi-4-mini | 17% | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
| gemma4:e4b | 100% | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Qwen3-4B-Function-Calling-xLAM | 83% | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ |
These are the same models that score 90%+ on code quality. Code quality ≠ agent capability.