Real agent coding benchmarks + MCP server quality. Updated hourly.
Updated: 2026-05-09T13:57:41
| Model | Tier | Score | Results | Cost | Time |
|---|---|---|---|---|---|
| Claude Sonnet 4 | premium | 83.30 | 7/10 passed | $0.0191 | 23s |
| Gemma 4 31B | premium | 80.00 | 6/10 passed | $0.0005 | 118s |
| Gemma 4 26B A4B | premium | 78.30 | 6/10 passed | $0.0005 | 66s |
| Mistral Large 3 | premium | 78.00 | 6/10 passed | $0.0018 | 18s |
| Qwen 3.6 Plus | budget | 76.60 | 6/10 passed | $0.0609 | 574s |
| Gemini 2.5 Flash | free-tier | 76.40 | 5/10 passed | $0.0037 | 12s |
| Kimi K2.6 | premium | 75.00 | 5/10 passed | $0.0051 | 40s |
| GPT-5.4 | premium | 74.90 | 6/10 passed | $0.0153 | 19s |
| MiniMax M2.7 | premium | 60.00 | 6/10 passed | $0.0190 | 137s |
| GPT-5.5 | premium | 58.30 | 5/10 passed | $0.0655 | 67s |
Updated: 2026-05-09T14:16:05
| Server | Status | Score | Stars | Issues | Updated |
|---|---|---|---|---|---|
| playwright-mcp | ● live | 80 | 32,252 | 4 | 2026-05-09 |
| mcp-git | ● live | 73 | 8,048 | 67 | 2026-05-09 |
| github-mcp-server | ● live | 60 | 29,649 | 329 | 2026-05-09 |
| fastmcp | ● live | 60 | 25,089 | 248 | 2026-05-09 |
| awesome-mcp-servers | ● live | 60 | 86,537 | 1130 | 2026-05-09 |
| mcp-pandoc | ● live | 43 | 534 | 6 | 2026-05-07 |
Updated: 2026-05-09T09:35:55
| Model | Bits | Size | Score | Results | Time |
|---|---|---|---|---|---|
| Qwen 3.5 9B (4-bit) | 4-bit | ~5GB | 83.0 | 7/10 passed | 190s |
| AgenticQwen 8B (4-bit) | 4-bit | ~5GB | 81.5 | 8/10 passed | 189s |
| Bonsai 4B (1-bit) | 1-bit | 545MB | 79.9 | 7/10 passed | 18s |
| Ternary Bonsai 1.7B (2-bit) | 2-bit (ternary) | 442MB | 79.9 | 7/10 passed | 10s |
| Bonsai 8B (1-bit) | 1-bit | 1.1GB | 79.8 | 8/10 passed | 15s |
| Ternary Bonsai 4B (2-bit) | 2-bit (ternary) | 1.0GB | 79.6 | 7/10 passed | 20s |
| Ternary Bonsai 8B (2-bit) | 2-bit (ternary) | 2.1GB | 78.2 | 7/10 passed | 22s |
| Bonsai 1.7B (1-bit) | 1-bit | 237MB | 73.4 | 4/10 passed | 8s |
Benchmarks refresh hourly via cron. Trust score dashboard →
Suggest an improvement, report an error, or just say hi.
Suggest an improvement, report an error, or just say hi.