You submit it in the morning, it's done by evening! Claude Sonnet 4.5 changes the game

Anthropic introduced Claude Sonnet 4.5, which tops the charts in coding benchmarks like SWE-bench
The model can work autonomously for up to 30 hours, compared to 7 hours for its predecessor Opus 4
The price remains the same as for Sonnet 4: 3 dollars per million input tokens, 15 for output

Sdílejte:

Jakub Kárník

Published: 30. 9. 2025 06:30

The startup Anthropic, behind the Claude chatbot, has just introduced a new model, Claude Sonnet 4.5. The company calls it the world’s best coding AI model and the most powerful tool for computer work. In reality, it’s an evolutionary, not revolutionary, step forward – but with some impressive numbers.

Thirty Hours of Continuous Work
Benchmarks: First Place, but Not in Everything
Agent SDK and New Features for Developers
Safety and "Alignment"
Experimental "Imagine with Claude"
Price and Availability
Is the Competition Sleeping, or Not?

Thirty Hours of Continuous Work

The main new feature is Sonnet 4.5’s ability to work autonomously for up to 30 hours. This is a significant leap compared to the Opus 4 model from May, which lasted a maximum of seven hours. During an internal test, Sonnet 4.5 created a functional clone of a Slack or Teams-like communication application – and wrote approximately 11,000 lines of code for it.

Anthropic claims that the model maintains attention even during multi-day tasks without losing context. In practice, this means a developer can submit a complex request in the morning and pick up the finished result in the evening. It sounds impressive, but the reality will probably be more prosaic – few projects can do without human oversight and iterations.

Benchmarks: First Place, but Not in Everything

Claude Sonnet 4.5 dominates SWE-bench Verified, a benchmark measuring the ability to solve real-world software tasks. Anthropic achieved an average score of 77.2% from a series of ten attempts. With advanced techniques like parallel test-time compute, the score climbed to 82.0%.

Another impressive result came from OSWorld, a benchmark for computer control – Sonnet 4.5 achieved 61.4% here, while its predecessor Sonnet 4 scored 42.2% last year. The model can browse websites, fill out spreadsheets, and complete multi-layered tasks directly in the browser.

Anthropic also published results from mathematical and logical tests (AIME, GPQA Diamond), where Sonnet 4.5 surpasses older Claude models, but in some categories lags behind OpenAI GPT-5 or Google Gemini 2.5 Pro. Interestingly, the model proved particularly effective in specialized areas such as finance, law, medicine, and STEM – even if it only achieves “grades C to D” there so far.

Agent SDK and New Features for Developers

Anthropic has released the Claude Agent SDK – the infrastructure on which their own Claude Code tool runs. Developers will thus get the building blocks for creating their own AI agents. The SDK includes memory management, a permissions system, and coordination between multiple agents working on a single goal.

Checkpoints have been added to Claude Code – the ability to save the ongoing state of work and return to it at any time. The Terminal has been redesigned, and native integration for VS Code has also been added. In Claude applications, code execution and file creation (spreadsheets, presentations, documents) are now available directly in the conversation.

Users of the premium Claude Max plan who signed up for the waiting list have gained access to a Chrome extension. This allows Claude to work directly in the browser – filling out forms, browsing pages, and automating repetitive tasks.

Safety and “Alignment”

Anthropic places great emphasis on Sonnet 4.5 being their most “aligned” model. In practice, this means the model behaves manipulatively less often – reducing the occurrence of flattery, deceptive behavior, desire for power, or support for user delusions.

The model is protected by the ASL-3 security framework, which includes classifiers detecting dangerous inputs and outputs – especially those associated with weapons of mass destruction (CBRN).

The number of false positive detections has been reduced by half since May and even by a factor of ten since the initial launch. The company promises further improvements.

Experimental “Imagine with Claude”

Together with Sonnet 4.5, Anthropic launched a temporary experiment “Imagine with Claude”. This is a tool that generates software on the fly – no functionality is pre-programmed, Claude creates code in real-time according to user requirements.

The experiment is available to Claude Max subscribers for five days at claude.ai/imagine. Anthropic describes it as a demonstration of what’s possible when you combine a powerful model with the right infrastructure.

Price and Availability

Claude Sonnet 4.5 is available starting today via API under the designation claude-sonnet-4-5. The pricing policy remains the same as for Sonnet 4: 3 dollars per million input tokens and 15 dollars per million output tokens.

The model can be used in Claude applications (web, mobile, desktop), via API, or in the Claude Code tool. Claude Code updates are available to all users, as are Agent SDK features for developers. Code execution and file creation work in all paid Claude application plans.

Is the Competition Sleeping, or Not?

The battle for the attention of developers and enterprise customers is waged almost week after week. A few days ago, OpenAI introduced Pulse – a ChatGPT feature for morning routines and ongoing research. Google is still refining its Gemini and pushing integration into enterprise tools.

What do you think of the new Sonnet 4.5 model?

Source: Anthropic, The Verge

About the author

Jakub Kárník

Jakub is known for his endless curiosity and passion for the latest technologies. His love for mobile phones started with an iPhone 3G, but nowadays… More about the author

Sdílejte: