不检测渠道接口规范, 只检测渠道任务能力, 通过多种渠道统计渠道差异
测试次数
1,215
测试题目
14
总计花费
$570.99
| 站点 | 渠道 | 模型 | 通过率 | 平均花费 (USD) | Token TPS | 测试次数 |
|---|---|---|---|---|---|---|
| ByteCat | OFFICIAL | gpt-5.5 | 88.9% | $0.1606 | 33.7 | 36 |
| Cubence | OFFICIAL | gpt-5.5 | 88.6% | $0.1578 | 28.3 | 44 |
| TimiCC | OFFICIAL | gpt-5.5 | 86.4% | $0.1002 | 28.6 | 44 |
| Micu | OFFICIAL | claude-opus-4-8 | 86.4% | $0.7568 | 43.8 | 44 |
| Packy Code | OFFICIAL | claude-opus-4-8 | 86.4% | $1.0233 | 41.1 | 44 |
| SSSAI Code | OFFICIAL | claude-opus-4-8 | 84.1% | $0.4747 | 39.3 | 44 |
| CCTQ | OFFICIAL | claude-opus-4-8 | 84.1% | $0.8389 | 42.6 | 44 |
| Micu | OFFICIAL | gpt-5.5 | 84.1% | $0.0871 | 23.8 | 44 |
| Neko Code | OFFICIAL | gpt-5.5 | 84.1% | $0.0901 | 27.0 | 44 |
| CCTQ | OFFICIAL | gpt-5.5 | 81.8% | $0.0770 | 27.9 | 44 |
| Yes Code | OFFICIAL | gpt-5.5 | 81.8% | $0.2399 | 33.4 | 44 |
| Packy Code | OFFICIAL | gpt-5.5 | 81.8% | $0.1315 | 27.5 | 44 |
| Right Code | OFFICIAL | gpt-5.5 | 81.4% | $0.1035 | 28.8 | 43 |
| 78 Code | OFFICIAL | claude-opus-4-8 | 79.5% | $0.7206 | 39.9 | 44 |
| Somebody | AWSQ | claude-opus-4-8 | 79.5% | $0.2820 | 17.9 | 44 |
| Right Code | OFFICIAL | claude-opus-4-8 | 79.5% | $0.9314 | 38.3 | 44 |
| Yunwu | OFFICIAL | claude-opus-4-8 | 79.5% | $0.8709 | 39.2 | 44 |
| Duck Code | OFFICIAL | gpt-5.5 | 77.3% | $0.2137 | 25.2 | 44 |
| Cubence | OFFICIAL | claude-opus-4-8 | 77.3% | $0.7315 | 36.0 | 44 |
| SSSAI Code | OFFICIAL | gpt-5.5 | 77.3% | $0.2296 | 28.6 | 44 |
| TimiCC | OFFICIAL | claude-opus-4-8 | 75% | $0.6531 | 37.0 | 44 |
| Yes Code | OFFICIAL | claude-opus-4-8 | 75% | $0.7261 | 37.8 | 44 |
| 78 Code | OFFICIAL | gpt-5.5 | 75% | $0.0542 | 28.5 | 44 |
| Duck Code | OFFICIAL | claude-opus-4-8 | 75% | $0.6352 | 27.2 | 44 |
| ByteCat | OFFICIAL | claude-opus-4-8 | 75% | $0.8251 | 48.3 | 36 |
| Neko Code | OFFICIAL | claude-opus-4-8 | 72.7% | $0.5941 | 34.0 | 44 |
| IKun Code | OFFICIAL | gpt-5.5 | 72.7% | $0.1894 | 27.4 | 44 |
| IKun Code | OFFICIAL | claude-opus-4-8 | 72.7% | $1.2604 | 32.7 | 44 |
以测试题为主键,展示在 terminal-bench / aider-polyglot 等真实编程任务下的跑分,而非接口连通性检测。
历史通过率 = 通过 trial 数 ÷ 总 trial 数 × 100%;水平线 = 所有 config 历史通过率的算术平均。
统计每个渠道(config)的全部历史 trial;同 task/config 历史 trial 数少于 3 次标记「样本不足」,避免小样本误导。指标含通过率、平均 Token、平均花费、Token TPS。