不检测渠道接口规范, 只检测渠道任务能力, 通过多种渠道统计渠道差异
测试次数
811
测试题目
10
总计花费
$424.02
右下角区域为黄金性价比模型。气泡半径越大代表生成速度越快。
| 站点 | 渠道 | 模型 | 通过率 | 平均花费 (USD) | Token TPS | 测试次数 |
|---|---|---|---|---|---|---|
| Packy Code | OFFICIAL | claude-opus-4-8 | 100% | $1.0597 | 28.2 | 31 |
| Micu | OFFICIAL | claude-opus-4-8 | 96.8% | $0.7969 | 31.5 | 31 |
| Right Code | OFFICIAL | claude-opus-4-8 | 93.5% | $1.0159 | 24.7 | 31 |
| CCTQ | OFFICIAL | claude-opus-4-8 | 93.5% | $0.8942 | 28.2 | 31 |
| SSSAI Code | OFFICIAL | claude-opus-4-8 | 93.5% | $0.5 | 27.2 | 31 |
| Yunwu | OFFICIAL | claude-opus-4-8 | 90.3% | $1.6242 | 25.3 | 31 |
| Somebody | AWSQ | claude-opus-4-8 | 90.3% | $0.295 | 19.3 | 31 |
| 78 Code | OFFICIAL | claude-opus-4-8 | 90.3% | $0.758 | 26.8 | 31 |
| TimiCC | OFFICIAL | claude-opus-4-8 | 87.1% | $0.7123 | 27.8 | 31 |
| Duck Code | OFFICIAL | claude-opus-4-8 | 87.1% | $0.6632 | 26.1 | 31 |
| Cubence | OFFICIAL | claude-opus-4-8 | 87.1% | $0.9261 | 29.3 | 31 |
| Cubence | OFFICIAL | gpt-5.5 | 83.9% | $0.1377 | 24 | 31 |
| Micu | OFFICIAL | gpt-5.5 | 83.9% | $0.0787 | 24.8 | 31 |
| Neko Code | OFFICIAL | gpt-5.5 | 83.9% | $0.0766 | 22.6 | 31 |
| Yes Code | OFFICIAL | claude-opus-4-8 | 83.9% | $0.839 | 23.1 | 31 |
| TimiCC | OFFICIAL | gpt-5.5 | 80.6% | $0.0942 | 24.5 | 31 |
| Yes Code | OFFICIAL | gpt-5.5 | 80.6% | $0.248 | 32.5 | 31 |
| Neko Code | OFFICIAL | claude-opus-4-8 | 80.6% | $0.6983 | 22.2 | 31 |
| Packy Code | OFFICIAL | gpt-5.5 | 77.4% | $0.1175 | 23.3 | 31 |
| IKun Code | OFFICIAL | claude-opus-4-8 | 77.4% | $1.3333 | 26.1 | 31 |
| CCTQ | OFFICIAL | gpt-5.5 | 77.4% | $0.0714 | 25.3 | 31 |
| Right Code | OFFICIAL | gpt-5.5 | 76.7% | $0.0486 | 23.5 | 30 |
| SSSAI Code | OFFICIAL | gpt-5.5 | 74.2% | $0.2182 | 29.3 | 31 |
| Duck Code | OFFICIAL | gpt-5.5 | 74.2% | $0.1797 | 19.9 | 31 |
| IKun Code | OFFICIAL | gpt-5.5 | 67.7% | $0.1751 | 23.7 | 31 |
| 78 Code | OFFICIAL | gpt-5.5 | 67.7% | $0.0431 | 22.7 | 31 |
| Codex For | OFFICIAL | gpt-5.5 | 66.7% | $0.3865 | 32.1 | 6 |
以测试题为主键,展示在 terminal-bench / aider-polyglot 等真实编程任务下的跑分,而非接口连通性检测。
历史通过率 = 通过 trial 数 ÷ 总 trial 数 × 100%;水平线 = 所有 config 历史通过率的算术平均。
统计每个渠道(config)的全部历史 trial;同 task/config 历史 trial 数少于 3 次标记「样本不足」,避免小样本误导。指标含通过率、平均 Token、平均花费、Token TPS。