Claude Opus 4.7 Is Here: What Changed, What the Benchmarks Say, and What to Do Next

Two days after the leaks pointed to an imminent launch, Anthropic made it official. Claude Opus 4.7 shipped on 16 April 2026, generally available across Claude.ai, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Same pricing as Opus 4.6: $5 per million input tokens, $25 per million output tokens.

This is not a minor point release. The coding benchmarks are meaningfully better, the vision system was rebuilt from scratch, and two new developer controls (xhigh effort and task budgets) change how you manage cost and quality on long-running agent workflows. Here is what actually changed and what it means for production systems.

What actually shipped

The model identifier is claude-opus-4-7. It is a drop-in replacement for claude-opus-4-6 across all API endpoints. The core improvements break down into four areas:

Coding: 13% improvement in task resolution over Opus 4.6. On SWE-bench Verified, Opus 4.7 scores 87.6% (up from 80.8%). On SWE-bench Pro, 64.3% (up from 53.4%).
Document reasoning: 21% fewer errors on document-heavy tasks.
Production task resolution: 3x more production-grade tasks completed versus Opus 4.6.
Instruction following: Substantially improved precision on detailed, multi-step prompts.

Vercel's engineering team reported a behaviour they had not seen in any prior Claude model: Opus 4.7 "does proofs on systems code before starting work." The model is verifying its own reasoning before writing output, which explains the improved consistency on complex tasks.

The benchmarks, in context

Numbers in isolation are noise. Here is where Opus 4.7 sits relative to the models you are most likely comparing it against:

SWE-bench Pro (autonomous coding):

Claude Opus 4.7: 64.3%
GPT-5.4: 57.7%
Gemini 3.1 Pro: 54.2%
Claude Opus 4.6: 53.4%

CursorBench (AI code editor performance):

Claude Opus 4.7: 70%
Claude Opus 4.6: 58%

Humanity's Last Exam (without tools):

Claude Opus 4.7: 46.9%
Gemini 3.1 Pro: 44.4%
GPT-5.4 Pro: 42.7%
Claude Opus 4.6: 40.0%

The CursorBench result is worth pausing on. For developers who spend most of their day inside Cursor or Claude Code, this benchmark measures something close to the actual experience. A 12-point jump is not marginal.

Agentic reasoning improved by 14%, with tool-call errors dropping to a third of the Opus 4.6 rate. If you run multi-step agent workflows that chain API calls, file operations, or database queries, the reliability upgrade is significant.

Vision: 3x the resolution

Opus 4.7 accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels. Opus 4.6 capped out at 1.15 megapixels. That is more than triple the resolution.

This matters for specific workflows. Technical diagrams, architectural drawings, dense spreadsheet screenshots, and design mockups now come through at actual fidelity instead of being downscaled to the point where small text or fine details are lost. If you have ever sent a screenshot of a Figma board to Claude and gotten garbled results, this is the fix.

Anthropic also reports improved interpretation of chemical structures and technical diagrams, which points to training data improvements beyond just the resolution increase.

xhigh effort and task budgets

Two new developer controls shipped alongside the model:

xhigh effort level. Until now, the effort parameter had three settings: low, medium, and high. Opus 4.7 adds xhigh, which sits between high and max. This gives you finer control over the reasoning depth vs latency tradeoff. For tasks where high is not thorough enough but max burns too many tokens and takes too long, xhigh fills the gap.

Task budgets. A new parameter that lets you guide how the model allocates tokens across longer tasks. Instead of hoping the model spends its context wisely, you can set explicit budgets for different phases of a multi-step workflow. This is particularly useful for agent systems that need to reserve capacity for later steps rather than exhausting the context window early.

Both features are API-level controls. If you are building on the Claude API directly, you can start using them immediately with the claude-opus-4-7 model identifier.

GitHub Copilot and platform availability

Opus 4.7 rolled out simultaneously on GitHub Copilot, available to Pro+, Business, and Enterprise subscribers. You can select it through the model picker in VS Code, Visual Studio, JetBrains IDEs, Xcode, Eclipse, the Copilot CLI, and GitHub Mobile.

GitHub describes the improvement as "stronger multi-step task performance and more reliable agentic execution." The model carries a 7.5x premium request multiplier as promotional pricing through 30 April 2026. After that, expect standard Opus-tier pricing.

The rollout will gradually replace Opus 4.5 and Opus 4.6 in Copilot Pro+ as part of GitHub's model consolidation. If you are on Copilot and have been defaulting to Opus 4.6, the switch to 4.7 will happen automatically over the coming weeks.

Beyond GitHub, Opus 4.7 is available day-one on Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, alongside the direct API.

The Mythos footnote

Anthropic's announcement included a reference to "Mythos Preview" as the company's best-aligned model, scoring 56.8% on Humanity's Last Exam compared to Opus 4.7's 46.9%. Mythos is not publicly available and operates under ASL-3 safety protocols with controlled rollout.

Anthropic explicitly stated that Opus 4.7 "does not advance our capability frontier." Translation: Mythos-class models are already ahead internally. Opus 4.7 is the best model available to everyone, but it is not the best model Anthropic has.

For anyone building production systems, this is informative but not actionable yet. Build for Opus 4.7 now. Keep your architecture flexible for when Mythos-class capabilities reach general availability.

What the leaks got right (and wrong)

Two weeks ago, I broke down what the npm package leak and the CMS dump revealed about Opus 4.7. Here is how the predictions held up:

Correct:

Opus 4.7 exists and shipped within the predicted window
Significant improvements on multi-step reasoning and coding tasks
Same pricing as Opus 4.6 ($5/$25 per million tokens)
Mythos is real and positioned above Opus as a next-generation family
Dense decoder transformer architecture (not mixture of experts)

Partially correct:

The leaked SWE-bench Verified prediction of 85-90% was close: actual is 87.6%
The Capybara tier exists internally, but Anthropic has not announced it as a public tier

Still unknown:

Whether Sonnet 4.8 is coming alongside or later
The scope of the rumoured AI design tool
Whether voice mode from the feature flags is near-term

The leaks were directionally accurate. The specific numbers were aspirational targets that ended up close to final performance. For anyone who read the leak analysis and made architectural decisions based on it, those decisions hold.

What to do if you build on Claude

Test before switching production workloads. The 4.6 to 4.7 upgrade should be smooth (backward compatible across the 4.x series), but "should" is not "guaranteed." Run your eval suite against claude-opus-4-7 before swapping the config.

Try xhigh effort on your hardest tasks. If you have prompts where high effort produces inconsistent results but max is too slow, xhigh is designed for exactly that gap. Test it on your toughest reasoning chains.

Revisit your vision workflows. If you previously avoided sending images to Claude because the resolution was too low, the 3x improvement changes the calculus. Screenshot-based workflows, diagram analysis, and document processing all benefit.

Evaluate task budgets for agent systems. If you run multi-step agents that occasionally run out of context or spend too many tokens on early steps, task budgets give you direct control. This is a meaningful upgrade for complex automation pipelines.

Keep your architecture model-agnostic. Mythos is coming. Capybara may ship as a public tier. The model landscape is shifting faster than any hardcoded model name can keep up with. Config-driven model selection is not optional anymore.

For systems I have built for clients, the migration path is the same as every prior version bump: update the model identifier in config, run the validation suite, confirm output quality, deploy. No code changes needed. That is what config-driven architecture buys you.

If you are building AI systems and want an architecture that handles model transitions without breaking, the AI Automation Systems service is built around exactly this kind of production resilience. And if you need help deciding how Opus 4.7's new features (xhigh effort, task budgets, improved vision) fit your specific use case, the AI Consulting and Roadmapping service starts with that assessment.

Claude Opus 4.7 Is Here: What Changed, What the Benchmarks Say, and What to Do Next

What actually shipped

The benchmarks, in context

Vision: 3x the resolution

xhigh effort and task budgets

GitHub Copilot and platform availability

The Mythos footnote

What the leaks got right (and wrong)

What to do if you build on Claude

Related Articles

Claude Opus 4.7 Leak: What We Actually Know, What Is Speculation, and What It Means If You Build on Claude

What Is the Claude API? A Practical Guide for Businesses in 2026

Claude vs ChatGPT for Business Automation: Which One to Pick in 2026

Want a systemlike this one?

Want a system
like this one?