Qwen 3.7 Plus vs 3.6 Plus, 3.7 Max, DeepSeek V4 Pro and Opus 4.8: the definitive coding and agent comparison

Qwen 3.7 Plus has reordered the conversation about which model to use for agentic coding. Not because it leads every benchmark — it doesn’t — but because it combines three things that until now did not coexist in a single model: low price, multimodal vision, and an autonomy ceiling of 35 hours. Released on June 1, 2026 by Alibaba, it joins a wave of models competing for the same space, each excelling at something different. Eleven days earlier, Qwen 3.7 Max had arrived, the text-only flagship. Two months earlier, DeepSeek V4 Pro had shaken up the market with open weights and unbeatable prices. And barely a week earlier, Anthropic had launched Claude Opus 4.8, the new king of SWE-Bench Pro. All are competing for the same space. Each excels at something different.

This is the comparison that brings together all five contenders across the dimensions that matter for development teams: coding benchmarks, agentic capabilities, price, and which workload each one suits best.

Technical specifications

Feature	Qwen 3.7 Plus	Qwen 3.6 Plus	Qwen 3.7 Max	DeepSeek V4 Pro	Claude Opus 4.8
Release	Jun 2026	Apr 2026	May 2026	Apr 2026	May 2026
Modality	Text + Image + Video	Text + Image	Text only	Text only	Text only
Context	1M tokens	1M tokens	1M tokens	1M tokens	200K tokens
Max output	—	—	65,536 tok	384,000 tok	—
Parameters	API-only (proprietary)	API-only (proprietary)	API-only (proprietary)	1.6T / 49B active (MIT)	API-only (closed)
Autonomy	35h / 1000+ tools	~35h	35h / 1158 tools	Not specified	Not specified
Self-hosting	❌	❌	❌	✅ (MIT)	❌

Pricing per million tokens

Model	Input	Output	Cached input	Ratio vs cheapest
DeepSeek V4 Pro (OpenRouter)	$0.435	$0.87	~$0.014 (cache hit)	1×
Qwen 3.6 Plus (OpenRouter)	$0.325	$1.95	—	2.24×
Qwen 3.7 Plus (OpenRouter)	$0.40	$1.60	$0.08	1.84×
Qwen 3.7 Max (OpenRouter)	$1.25	$3.75	$0.25	4.31×
Claude Opus 4.8 (direct)	$5.00	$25.00	$0.50	28.7×

The strongest takeaway from this table: DeepSeek V4 Pro costs 29 times less than Opus 4.8 on output, while delivering competitive coding performance. Qwen 3.7 Plus lands at an interesting midpoint: cheaper than its predecessor (3.6 Plus) on output ($1.60 vs $1.95), and drastically cheaper than its bigger sibling Max ($1.60 vs $7.50).

Coding benchmarks

Benchmark	Qwen 3.7 Plus	Qwen 3.6 Plus	Qwen 3.7 Max	DeepSeek V4 Pro	Opus 4.8
SWE-Bench Pro	~60%	56.6%	60.6%	59.0%	69.2% 🏆
SWE-Bench Verified	~79%*	78.8%	80.4%	80.6%	—
Terminal-Bench 2.0	—	61.6%	69.7% 🏆	67.9%	—
LiveCodeBench	—	—	—	93.5% 🏆	—

*Estimated. Qwen 3.7 Plus shares its text backbone with 3.7 Max.

What each benchmark says:

SWE-Bench Pro: The hardest. Real multi-file bugs in open repos. Opus 4.8 dominates with 69.2%, but costs 15× more than Qwen 3.7 Plus and 29× more than DeepSeek.
SWE-Bench Verified: More accessible tasks. DeepSeek V4 Pro (80.6%) and Qwen 3.7 Max (80.4%) are essentially tied. Qwen 3.6 Plus trails ~2 points behind.
Terminal-Bench 2.0: Agentic shell execution. Qwen 3.7 Max (69.7) leads, followed by DeepSeek (67.9). Qwen 3.6 Plus (61.6) lags behind.
LiveCodeBench: Competitive coding. DeepSeek V4 Pro (93.5%) crushes Opus 4.7 (84.7%) and GPT-5.5 (85.3%).

Agentic benchmarks (tool calling)

Benchmark	Qwen 3.7 Plus*	Qwen 3.6 Plus	Qwen 3.7 Max	DeepSeek V4 Pro	Opus 4.6 Max†
BFCL-V4 (tool calling)	~74	68.9	75.0	70.6	76.7 🏆
MCP-Mark (MCP tools)	—	48.2	60.8	57.1	56.7
MCP-Atlas (MCP ecosystem)	76.4	74.1	76.4	73.6	75.8

*Estimated. Qwen 3.7 Plus inherits the agentic backbone from 3.7 Max. †Data for Opus 4.6, not 4.8 (4.8 data not available for BFCL/MCP).

Qwen 3.7 Max leads in tool calling among Chinese models. DeepSeek V4 Pro is a few points behind. Qwen 3.7 Plus, sharing the same agentic stack, should be close to Max. On MCP-Atlas, Plus and Max get exactly the same score (76.4), suggesting the agentic backbone is identical.

Multimodal capabilities

Capability	Qwen 3.7 Plus	Qwen 3.6 Plus	Qwen 3.7 Max	DeepSeek V4 Pro	Opus 4.8
Vision (image)	✅	✅	❌	❌	❌*
Video	✅	❌	❌	❌	❌
Computer use	✅ (GUI navigation)	—	❌	❌	✅ (beta)
Vision Arena rank	#16 🏆	—	N/A	N/A	N/A

*Opus 4.8 has computer use to view screens, but not native multimodal vision.

Qwen 3.7 Plus is the only model in this comparison with native vision (image + video) at text-level pricing. This changes the calculus for teams working with visual references: UI mockups, bug screenshots, wireframes. DeepSeek V4 Pro and Qwen 3.7 Max cannot do this. Opus 4.8 can view screens through computer use, but that is an additional layer, not native multimodal processing.

Decision matrix

Scenario	Recommended model	Why
Daily driver for coding (default)	Qwen 3.7 Plus	Price-performance-vision balance. 6× cheaper than Max, sees images, same autonomy ceiling.
Maximum coding performance	Opus 4.8	SWE-Bench Pro 69.2%. For deep debugging where cost doesn't matter.
Tight budget, high volume	DeepSeek V4 Pro	Output at $0.87/M, open-weight, LiveCodeBench 93.5%. Best quality-price ratio for coding.
Extreme autonomy (24h+ pipelines)	Qwen 3.7 Plus / Max	35h with >1000 verified tool calls. No other model has this documented.
Self-hosting / privacy	DeepSeek V4 Pro (MIT) or Qwen 35B-A3B	Open weights. Qwen 3.7 Plus and Max are API-only.
Coding with visual references	Qwen 3.7 Plus	Only one with native vision at a competitive price.
Intensive tool calling	Qwen 3.7 Max / Plus	BFCL-V4 75.0, MCP-Atlas 76.4. Lead in tool calling among non-Anthropic models.

Verdict

There is no absolute winner, but there is a clear trend. Qwen 3.7 Plus is the most balanced model on the market today for development teams doing agentic coding. It doesn’t lead any individual benchmark — Opus 4.8 wins at SWE-Bench Pro, DeepSeek V4 Pro wins on price and LiveCodeBench, Qwen 3.7 Max wins at Terminal-Bench and tool calling. But Plus is the only one that simultaneously covers solid code performance, low price, multimodal vision, and extreme autonomy.

The era of the “best model” is over. Now it’s about choosing the right combination of attributes for each workload.

Main source: Qwen 3.7 Plus vs Qwen 3.7 Max — ofox.ai