Model Overview with Pricing and Capabilities
Model | Primary Task Area | Key Strengths | Additional Capabilities | HumanEval Accuracy | SWE-bench Accuracy | Premium Request Multiplier (Paid Plans) | Premium Request Multiplier (Free Plan) | Effective Cost per Request* |
---|---|---|---|---|---|---|---|---|
GPT-4.1 | General-purpose coding and writing | Fast, accurate code completions and explanations | Agent mode, visual | ~84% | ~55% | 0× (included) | 1× | $0.00 |
GPT-4o | General-purpose coding and writing | Fast completions and visual input understanding | Agent mode, visual | ~90% | ~33% | 0× (included) | 1× | $0.00 |
o3 | Deep reasoning and debugging | Multi-step problem solving and architecture-level code analysis | Reasoning | ~92% | ~72% | 1× | N/A | $0.04 |
o4-mini | Fast help with simple tasks | Fast, reliable answers to lightweight coding questions | Lower latency | ~90% | ~49% | 0.33× | N/A | $0.013 |
Claude Opus 4 | Deep reasoning and debugging | Complex problem-solving challenges, sophisticated reasoning | Reasoning, vision | ~94% | 72.5% | 10× | N/A | $0.40 |
Claude Sonnet 3.5 | Fast help with simple tasks | Quick responses for code, syntax, and documentation | Agent mode | 93.7% | ~60% | 1× | 1× | $0.04 |
Claude Sonnet 3.7 | Deep reasoning and debugging | Structured reasoning across large, complex codebases | Agent mode | ~91% | ~62% | 1× | N/A | $0.04 |
Claude Sonnet 3.7 Thinking | Deep reasoning and debugging | Enhanced reasoning with explicit thought processes | Agent mode, reasoning chains | ~92% | ~64% | 1.25× | N/A | $0.05 |
Claude Sonnet 4 | Deep reasoning and debugging | Performance and practicality, perfectly balanced for coding workflows | Agent mode, vision | ~92% | 72.7% | 1× | N/A | $0.04 |
Gemini 2.5 Pro | Deep reasoning and debugging | Complex code generation, debugging, and research workflows | Reasoning | ~88% | 63.8% | 1× | N/A | $0.04 |
Gemini 2.0 Flash | Working with visuals | Real-time responses and visual reasoning for UI and diagram-based tasks | Visual, low latency | ~85% | ~45% | 0.25× | 1× | $0.01 |
Plan Allowances
Free Plan (Copilot Free)
- Code completions: Up to 2,000 per month
- Premium requests: Up to 50 per month
- Available models: GPT-4.1, GPT-4o, Claude Sonnet 3.5, Gemini 2.0 Flash
- All interactions count as premium requests
Paid Plans
- Code completions: Unlimited (with included models)
- Chat interactions: Unlimited (with included models)
- Premium request allowances:
- Copilot Pro: 1,500 premium requests/month
- Copilot Business: 300 premium requests/month
- Copilot Enterprise: 1,000 premium requests/month
- Overage pricing: $0.04 per additional premium request
Accuracy Rankings
Top Performers by Benchmark
HumanEval (Code Generation)
- Claude Opus 4: ~94% – Best overall coding accuracy
- Claude Sonnet 3.5: 93.7% – Excellent for code generation
- o3: ~92% – Strong reasoning-based coding
- Claude Sonnet 4: ~92% – Balanced performance
- Claude Sonnet 3.7 Thinking: ~92% – Enhanced reasoning
SWE-bench (Real-world Software Engineering)
- Claude Sonnet 4: 72.7% – Best practical coding performance
- Claude Opus 4: 72.5% – Nearly tied for first
- o3: ~72% – Strong on complex problems
- Gemini 2.5 Pro: 63.8% – Solid engineering tasks
- Claude Sonnet 3.7 Thinking: ~64% – Good reasoning approach
Model Selection Guide
- Gemini 2.0 Flash (0.25× multiplier) – Best value for quick tasks
- o4-mini (0.33× multiplier) – Fast, lightweight responses
- GPT-4.1/GPT-4o (0× multiplier) – Free for paid plan users
For Complex Reasoning
- Claude Opus 4 (10× multiplier) – Most powerful, highest cost
- o3 (1× multiplier) – Strong reasoning at standard cost
- Claude Sonnet 4 (1× multiplier) – Balanced performance and cost
For Visual Tasks
- Gemini 2.0 Flash – Optimized for visual input, low cost
- GPT-4o – Strong visual capabilities, free for paid users
- Claude Sonnet 4 – Vision support with reasoning
Notes
- *Effective cost per request based on $0.04 base rate × multiplier
- Premium request counters reset monthly on the 1st at 00:00:00 UTC
- Unused requests don’t carry over to the next month
- Rate limiting applies during high demand periods
- Model availability varies by plan type