IA al Día
the efficient way to stay informed
Back to archive
Models June 3, 2026 analysis 6 min read

Qwen 3.7 Plus vs 3.6 Plus, 3.7 Max, DeepSeek V4 Pro and Opus 4.8: the definitive coding and agent comparison

On June 1, 2026, Alibaba launched Qwen 3.7 Plus, and suddenly the conversation about which model to use for agentic coding was completely reordered.

Qwen 3.7 Plus vs 3.6 Plus, 3.7 Max, DeepSeek V4 Pro and Opus 4.8: the definitive coding and agent comparison
By IA al Día

On June 1, 2026, Alibaba launched Qwen 3.7 Plus, and suddenly the conversation about which model to use for agentic coding was completely reordered. Not because it leads every benchmark — it doesn’t — but because it combines three things that until now did not coexist in a single model: low price, multimodal vision, and an autonomy ceiling of 35 hours. Eleven days earlier, Qwen 3.7 Max had arrived, the text-only flagship. Two months earlier, DeepSeek V4 Pro had shaken up the market with open weights and unbeatable prices. And barely a week earlier, Anthropic had launched Claude Opus 4.8, the new king of SWE-Bench Pro. All are competing for the same space. Each excels at something different.

This is the comparison that brings together all five contenders across the dimensions that matter for development teams: coding benchmarks, agentic capabilities, price, and which workload each one suits best.

Technical specifications

Feature Qwen 3.7 Plus Qwen 3.6 Plus Qwen 3.7 Max DeepSeek V4 Pro Claude Opus 4.8
Release Jun 2026 Apr 2026 May 2026 Apr 2026 May 2026
Modality Text + Image + Video Text + Image Text only Text only Text only
Context 1M tokens 1M tokens 1M tokens 1M tokens 200K tokens
Max output 65,536 tok 384,000 tok
Parameters API-only (proprietary) API-only (proprietary) API-only (proprietary) 1.6T / 49B active (MIT) API-only (closed)
Autonomy 35h / 1000+ tools ~35h 35h / 1158 tools Not specified Not specified
Self-hosting ✅ (MIT)

Pricing per million tokens

Model Input Output Cached input Ratio vs cheapest
DeepSeek V4 Pro (OpenRouter) $0.435 $0.87 ~$0.014 (cache hit)
Qwen 3.6 Plus (OpenRouter) $0.325 $1.95 2.24×
Qwen 3.7 Plus (OpenRouter) $0.40 $1.60 $0.08 1.84×
Qwen 3.7 Max (OpenRouter) $1.25 $3.75 $0.25 4.31×
Claude Opus 4.8 (direct) $5.00 $25.00 $0.50 28.7×

The strongest takeaway from this table: DeepSeek V4 Pro costs 29 times less than Opus 4.8 on output, while delivering competitive coding performance. Qwen 3.7 Plus lands at an interesting midpoint: cheaper than its predecessor (3.6 Plus) on output ($1.60 vs $1.95), and drastically cheaper than its bigger sibling Max ($1.60 vs $7.50).

Coding benchmarks

Benchmark Qwen 3.7 Plus Qwen 3.6 Plus Qwen 3.7 Max DeepSeek V4 Pro Opus 4.8
SWE-Bench Pro ~60% 56.6% 60.6% 59.0% 69.2% 🏆
SWE-Bench Verified ~79%* 78.8% 80.4% 80.6%
Terminal-Bench 2.0 61.6% 69.7% 🏆 67.9%
LiveCodeBench 93.5% 🏆

*Estimated. Qwen 3.7 Plus shares its text backbone with 3.7 Max.

What each benchmark says:

  • SWE-Bench Pro: The hardest. Real multi-file bugs in open repos. Opus 4.8 dominates with 69.2%, but costs 15× more than Qwen 3.7 Plus and 29× more than DeepSeek.
  • SWE-Bench Verified: More accessible tasks. DeepSeek V4 Pro (80.6%) and Qwen 3.7 Max (80.4%) are essentially tied. Qwen 3.6 Plus trails ~2 points behind.
  • Terminal-Bench 2.0: Agentic shell execution. Qwen 3.7 Max (69.7) leads, followed by DeepSeek (67.9). Qwen 3.6 Plus (61.6) lags behind.
  • LiveCodeBench: Competitive coding. DeepSeek V4 Pro (93.5%) crushes Opus 4.7 (84.7%) and GPT-5.5 (85.3%).

Agentic benchmarks (tool calling)

Benchmark Qwen 3.7 Plus* Qwen 3.6 Plus Qwen 3.7 Max DeepSeek V4 Pro Opus 4.6 Max†
BFCL-V4 (tool calling) ~74 68.9 75.0 70.6 76.7 🏆
MCP-Mark (MCP tools) 48.2 60.8 57.1 56.7
MCP-Atlas (MCP ecosystem) 76.4 74.1 76.4 73.6 75.8

*Estimated. Qwen 3.7 Plus inherits the agentic backbone from 3.7 Max. †Data for Opus 4.6, not 4.8 (4.8 data not available for BFCL/MCP).

Qwen 3.7 Max leads in tool calling among Chinese models. DeepSeek V4 Pro is a few points behind. Qwen 3.7 Plus, sharing the same agentic stack, should be close to Max. On MCP-Atlas, Plus and Max get exactly the same score (76.4), suggesting the agentic backbone is identical.

Multimodal capabilities

Capability Qwen 3.7 Plus Qwen 3.6 Plus Qwen 3.7 Max DeepSeek V4 Pro Opus 4.8
Vision (image) ❌*
Video
Computer use ✅ (GUI navigation) ✅ (beta)
Vision Arena rank #16 🏆 N/A N/A N/A

*Opus 4.8 has computer use to view screens, but not native multimodal vision.

Qwen 3.7 Plus is the only model in this comparison with native vision (image + video) at text-level pricing. This changes the calculus for teams working with visual references: UI mockups, bug screenshots, wireframes. DeepSeek V4 Pro and Qwen 3.7 Max cannot do this. Opus 4.8 can view screens through computer use, but that is an additional layer, not native multimodal processing.

Decision matrix

Scenario Recommended model Why
Daily driver for coding (default) Qwen 3.7 Plus Price-performance-vision balance. 6× cheaper than Max, sees images, same autonomy ceiling.
Maximum coding performance Opus 4.8 SWE-Bench Pro 69.2%. For deep debugging where cost doesn't matter.
Tight budget, high volume DeepSeek V4 Pro Output at $0.87/M, open-weight, LiveCodeBench 93.5%. Best quality-price ratio for coding.
Extreme autonomy (24h+ pipelines) Qwen 3.7 Plus / Max 35h with >1000 verified tool calls. No other model has this documented.
Self-hosting / privacy DeepSeek V4 Pro (MIT) or Qwen 35B-A3B Open weights. Qwen 3.7 Plus and Max are API-only.
Coding with visual references Qwen 3.7 Plus Only one with native vision at a competitive price.
Intensive tool calling Qwen 3.7 Max / Plus BFCL-V4 75.0, MCP-Atlas 76.4. Lead in tool calling among non-Anthropic models.

Verdict

There is no absolute winner, but there is a clear trend. Qwen 3.7 Plus is the most balanced model on the market today for development teams doing agentic coding. It doesn’t lead any individual benchmark — Opus 4.8 wins at SWE-Bench Pro, DeepSeek V4 Pro wins on price and LiveCodeBench, Qwen 3.7 Max wins at Terminal-Bench and tool calling. But Plus is the only one that simultaneously covers solid code performance, low price, multimodal vision, and extreme autonomy.

The era of the “best model” is over. Now it’s about choosing the right combination of attributes for each workload.


Main source: Qwen 3.7 Plus vs Qwen 3.7 Max — ofox.ai

More in this category