Performance Tuning¶

Claude Code offers multiple models, modes, and settings that affect speed, quality, and cost. This guide covers how to choose the right configuration for each task and keep your sessions efficient.

Model Selection¶

Claude Code supports several models in the Claude family. Each trades off capability, speed, and cost differently.

Model	Best For	Speed	Cost
Claude Opus 4.7	Complex architecture, multi-file refactors, subtle bugs	Slower	Highest
Claude Sonnet 4.6	General development, code review, feature implementation	Balanced	Medium
Claude Haiku 4.5	Quick questions, simple edits, formatting, boilerplate	Fastest	Lowest

When to Use Opus¶

Choose Opus for tasks where quality matters most:

Designing system architecture across many files
Debugging subtle concurrency or race condition issues
Large-scale refactoring that requires understanding the full codebase
Complex migration planning with many interdependencies

When to Use Sonnet¶

Sonnet is the default and handles most development tasks well:

Writing new features and implementing business logic
Code review and PR feedback
Writing and fixing tests
General debugging with clear error messages

When to Use Haiku¶

Use Haiku when speed matters more than depth:

Generating boilerplate code
Simple rename or formatting tasks
Quick syntax questions
Exploratory questions about your codebase

Fast Mode¶

Toggle fast mode with /fast in any session. Fast mode uses the same model but optimizes for faster output. This is useful for:

Rapid iteration on small changes
Sessions where you are making many quick edits
Tasks where waiting for output slows your flow

Fast mode does not reduce quality — it uses the same Claude Opus 4.7 model with faster output generation.

The 1M Token Context Window (Opus 4.7)¶

As of v2.1.139, Opus 4.7 supports a 1M token context window by default on Max, Team, and Enterprise plans (previously this required extra usage). Sonnet and Haiku have smaller context windows and will hit compression sooner.

With Opus, you can work through large features, multi-file refactors, and extended debugging sessions without hitting context limits. However, larger context does not mean free -- every token in the window is sent with each new message, affecting both speed and cost.

Performance implications at different context sizes (Opus): - Sessions under 200K tokens feel fast and responsive - Sessions at 400K-600K tokens may start to feel noticeably slower - Sessions approaching 1M tokens work but cost significantly more per exchange - /compact remains your primary tool for keeping sessions lean and fast

For Sonnet and Haiku users: Context management is more important since you will hit compression earlier. Use /compact more frequently and keep sessions focused.

Context Management for Performance¶

Long conversations slow Claude down and increase token costs. Opus's 1M limit gives you room to breathe, but keeping sessions lean still improves speed across all models:

Use `/compact` Strategically¶

The /compact command compresses your conversation history while preserving key context. With Opus's 1M window, you no longer need to compact to survive -- but compact to stay fast. On Sonnet and Haiku, compacting also prevents hitting the smaller context ceiling:

After finishing a subtask, before starting the next one
When context exceeds ~200K tokens and responses feel slower
When Claude seems to be losing track of earlier context
When /cost shows spending climbing faster than expected

Start Fresh for New Tasks¶

Use /clear or start a new claude session for unrelated tasks. A clean context means faster responses and more focused output.

Write Good CLAUDE.md Files¶

Well-written CLAUDE.md files reduce the need for repeated explanations. Instead of telling Claude about your project structure every session, document it once:

```markdown

Project Structure¶

src/api/ — Express route handlers
src/services/ — Business logic
src/models/ — Sequelize models
tests/ — Jest test files mirroring src/ structure ```

Reducing Token Usage¶

Every token you send and receive costs money. These patterns reduce waste:

Be Concise in Prompts¶

```bash

Efficient: clear and brief¶

claude "Add input validation to the POST /users endpoint. Require email and name, both non-empty strings."

Wasteful: verbose and repetitive¶

claude "I need you to please add some input validation to the users endpoint. The endpoint is POST /users and it should validate that the email field is present and is a non-empty string, and also that the name field is present and is a non-empty string as well." ```

Avoid Re-reading Large Files¶

If Claude has already read a file in this session, it remembers the contents. Do not paste file contents that Claude can read with its tools.

Scope Tasks Narrowly¶

```bash

Efficient: targeted¶

claude "Fix the date parsing bug in src/utils/dates.ts line 42"

Costly: broad¶

claude "Fix all the bugs in the project" ```

Headless Mode Performance¶

In CI and automation, use headless mode (-p flag) for single-shot tasks:

```bash

Fast: single prompt, no interactive overhead¶

claude -p "Generate TypeScript types from the OpenAPI spec at api/spec.yaml"

Pipe input directly¶

cat error.log | claude -p "What caused this error?" ```

Headless mode skips the interactive session setup, saving time for automated workflows.

Optimizing for Specific Workflows¶

Code Review¶

For faster reviews, tell Claude what to focus on:

bash claude "Review this diff for security issues only. Skip style and formatting feedback."

Batch Operations¶

Group related changes into one prompt instead of making many small requests:

```bash

claude "Rename the User model to Account across the entire codebase: model, routes, tests, and types."

Instead of separate prompts for each file¶

```

Parallel Agents¶

For large tasks, Claude can spawn sub-agents that work in parallel. This is automatic for tasks like codebase exploration but you can encourage it:

bash claude "Search for all usages of the deprecated authenticate() function and update them to use verifySession() instead. Work on multiple files in parallel."

Monitoring Performance¶

Token Usage¶

Check your token consumption in the Claude Code dashboard or by watching the session output. If costs are high:

Use /compact more frequently
Switch to Haiku for simple tasks
Keep prompts focused and avoid restating context

Response Time¶

If responses feel slow:

Toggle /fast mode
Check if your conversation is very long (use /compact)
Consider whether a lighter model would suffice
Ensure your network connection is stable

Performance Tuning¶

Model Selection¶

When to Use Opus¶

When to Use Sonnet¶

When to Use Haiku¶

Fast Mode¶

The 1M Token Context Window (Opus 4.7)¶

Context Management for Performance¶

Use /compact Strategically¶

Start Fresh for New Tasks¶

Write Good CLAUDE.md Files¶

Project Structure¶

Reducing Token Usage¶

Be Concise in Prompts¶

Efficient: clear and brief¶

Wasteful: verbose and repetitive¶

Avoid Re-reading Large Files¶

Scope Tasks Narrowly¶

Efficient: targeted¶

Costly: broad¶

Headless Mode Performance¶

Fast: single prompt, no interactive overhead¶

Pipe input directly¶

Optimizing for Specific Workflows¶

Code Review¶

Batch Operations¶

One prompt for all related changes¶

Instead of separate prompts for each file¶

Parallel Agents¶

Monitoring Performance¶

Token Usage¶

Response Time¶

See Also¶

Use `/compact` Strategically¶