Technical Review: Claude 3.7 Sonnet & Claude Code for Developers

Anthropic just released Claude 3.7 Sonnet and Claude Code, showing significant advancements in reasoning and AI development assistance that could fundamentally change how developers work with AI tools.

Claude 3.7 Sonnet and Claude Code Released

In this blog, we dive into technical details of the new model and Claude Code. We'll explore benchmarks, real-world projects, and why developers should pay attention to these releases. Let's dive in!

Key Takeaways

Claude 3.7 Sonnet excels in software engineering and agentic workflows, outperforming OpenAI, Grok, and DeepSeek in these categories.
OpenAI's models still lead in multilingual Q&A, visual reasoning, and mathematical problem-solving, with o3-mini dominating in complex math tasks.
Grok 3 showed superior performance in visual reasoning and high school mathematics than Claude 3.7 Sonnet.
Claude 3.5 Sonnet lags behind Claude 3.7 Sonnet in most areas (unsurprisingly).

What's new about Claude 3.7 Sonnet?

1. The first hybrid reasoning model

Users have full transparency into the model's thought process. Being the first hybrid reasoning model, Claude 3.7 Sonnet can operate in two distinct modes:

Standard mode: Quick responses for everyday tasks
Extended thinking mode: Deep reasoning for complex problems

2. API access for reasoning models

Claude 3.7 is available via Anthropic API, AWS Bedrock, and Google Vertex AI, making it one of the few reasoning models accessible via API.

Developers can set a "thinking budget" through the API by setting a maximum token limit for reasoning (up to 128K tokens).

3. Better coding capabilities

Claude 3.7 Sonnet establishes itself as an industry leader in code generation and understanding. The model:

Achieves state-of-the-art results on SWE-bench Verified (a benchmark for real-world software issues)
Excels at TAU-bench (which tests AI agents handling complex workflows)
Has been recognized as the preferred AI coding assistant by major development platforms including Cursor, Cognition, Vercel, and Replit

4. More advanced agentic abilities

Initial experimentation with Claude 3.7 Sonnet showed impressive agentic abilities:

Supports GitHub integration for a much deeper understanding of your codebase
Achieved an 81% success rate in online shopping tasks and 58.4% in booking flights

5. Improved safety and reduced refusals

The new model shows a 45% reduction in unnecessary refusals compared to Claude 3.5 Sonnet, while maintaining strong resistance to prompt injection attacks and other adversarial exploits.

6. Claude Code: a command-line AI assistant

Claude Code is a command-line AI assistant that integrates with development workflows, a completely new product category for Anthropic.

Unlike previous AI coding assistants, Claude Code runs in your terminal and directly modifies your local files, reducing the need for complex integrations or extra servers.

Key Features	Description
Repository-wide understanding	Reads entire codebase, enables context-aware suggestions, identify dependencies, and explains file structures.
Task automation	Handles search, editing, debugging and test writing.
Build debugging	Detects issues, fixes them, and retries until builds succeed.
GitHub integration	Manages GitHub tasks (commits, PRs, etc.), always requesting approval before making changes.

Start monitoring your Claude app with Helicone ⚡️

Track your LLM app usage and costs in production with 1-line of code.

import anthropic

client = anthropic.Anthropic(
  api_key=ANTHROPIC_API_KEY,
  base_url="https://anthropic.helicone.ai/{HELICONE_API_KEY}",
)

How much does Claude 3.7 Sonnet costs?

Claude 3.7 Sonnet pricing matches that of Claude 3.5 Sonnet:

$3 per million input tokens
$15 per million output tokens (including thinking tokens)

You can calculate the cost of Claude 3.7 Sonnet using Helicone's LLM API pricing calculator.

Claude 3.7 Sonnet Benchmark Comparison

Anthropic released a series of benchmarks to showcase Claude 3.7 Sonnet's capabilities.

Overall, Claude 3.7 is the best LLM for software engineering and building AI agents, but isn't the best at math or visual reasoning.

Claude 3.7 Official Benchmarks

Image Source: Official Claude Announcement

Fun Fact 💡

Claude 3.5 Sonnet made the most money on OpenAI's SWE-Lancer benchmark—a benchmark testing AI tools' performance on real Upwork tasks.

How to Access Claude 3.7 Sonnet and Claude Code

Claude 3.7 Sonnet is available to all users.
While the free tier does not include extended thinking capabilities, Pro, Team, and Enterprise users can access itvia Web or Apps.
Developers can access the latest model via the Anthropic API - use model string claude-3-7-sonnet-20250219, Amazon Bedrock or Google Cloud Vertex AI
Claude Code is currently in limited research preview and is available only to a few select users. You can join the waitlist here.

Real-World Projects with Claude 3.7 Sonnet

Generally speaking, Claude 3.7 Sonnet has been nothing short of remarkable.

Let's take a look at what users have been creating with Claude 3.7 Sonnet, all with a single prompt:

Stunning 3D City (with Live NPCs) by Ozgur Ozer

Animated Weather Cards by @AGI_FromWalmart

Ball in a Rotating Hexagon by @t3dotgg

The result below was generated by the standard Claude 3.7. Interestingly, the result from the extended thinking mode was broken.

TL;DR: Developers have generally found Claude to be a great coding companion—the best among all the currently available models.

These models often give me the same feeling I had when using ChatGPT-4 for the first time, where I am equally impressed and a little unnerved by what it can do.—Ethan Mollick

Where the future of AI development is headed

With Claude 3.7 Sonnet and Claude Code, we're seeing a shift from AI as a simple autocomplete tool to an autonomous development assistant.

Here are some trends we can expect:

AI-native development environments will become standard, with models that understand code at a fundamental level.
Continuous integration with AI agents that automatically maintain and improve codebases.
Specialized AI assistants for different development roles will emerge (frontend, backend, DevOps).

So, give them a try today, we can't wait to see what you build!

Start Monitoring Your Claude 3.7 Sonnet App 💡

Integrate Helicone with your Claude 3.7 Sonnet app to start tracking cost and usage in production.

FAQs

How does Claude 3.7's "extended thinking" feature work?

Claude 3.7 allows users to set a "thinking budget" of up to 128K tokens. During this thinking phase, the model explores multiple approaches, evaluates potential solutions, and engages in deeper reasoning before providing a response. This thinking is transparent, so users can see the model's reasoning process.

Is Claude Code secure for use with proprietary code?

Claude Code operates under the same data security policies as other Anthropic products. Code submitted to Claude Code is subject to Anthropic's data retention policies. Claude Code only sends necessary data to the API and doesn't store your entire codebase or train models with your data.

How does Claude 3.7 compare to other leading AI models for coding tasks?

Claude 3.7 currently leads in benchmark performance on coding tasks, particularly on SWE-bench Verified and TAU-bench. Its hybrid reasoning capabilities give it an edge in complex software engineering tasks that require a deep understanding of codebases and programming concepts.

What programming languages does Claude 3.7 support?

Claude 3.7 shows strong performance across widely-used languages including Python, JavaScript, TypeScript, Java, C++, Go, and Rust. It generally performs better with more common languages and frameworks due to their representation in training data.

Do I need special hardware to run Claude Code?

Claude Code is a command-line tool that runs on standard developer machines. The heavy computational work happens on Anthropic's servers, so local hardware requirements are minimal. However, a stable internet connection is necessary as the tool communicates with Anthropic's API.

When is Claude 4.0 going to be released?

There is no official release date for Claude 4.0 yet. While a lot of people expected the next model after Claude 3.5 to be Claude 4.0, Anthropic clearly had different plans. We're just as eager as you to see what 4.0 has in store.

What is Claude's 3.7's context window?

Claude 3.7 has a context window of 200k tokens, like Claude 3.5. This allows the model to consider a large amount of information when generating responses.

Can Claude 3.7 access the internet?

No. Claude 3.7 does not have direct internet access. It also does not possess "Deep Research" abilities unlike most of its competitors.

Questions or feedback?

Are the information out of date? Please raise an issue or contact us, we'd love to hear from you!

Time: 12 minute read

Created: February 25, 2024

Authors: Yusuf Ishola, Lina Lam