Create Message Request (Claude)
Text Series
Create Message Request (Claude)
POST
Create Message Request (Claude)
Introduction
Claude’s native message API, suitable for native Anthropic clients like Claude Code. This API follows Anthropic’s specification and provides full Claude model capabilities, including Extended Thinking, tool calling, and other advanced features.If you’re using an OpenAI-compatible client (like OpenAI SDK), we recommend using the
/v1/chat/completions endpoint instead.Authentication
Bearer Token, e.g.,
Bearer sk-xxxxxxxxxxRequest Parameters
Claude model identifier, supported models include:
claude-opus-4-5-20251101- Claude Opus 4.5 (Latest, strongest reasoning)claude-haiku-4-5-20251001- Claude Haiku 4.5 (Latest, fastest)claude-sonnet-4-5-20250929- Claude Sonnet 4.5 (Latest, balanced)claude-opus-4-1-20250805- Claude Opus 4.1claude-sonnet-4-20250514- Claude Sonnet 4- Other Claude series models
List of conversation messages, each containing
role (user/assistant) and content. content can be a string or an array of media content.Maximum number of tokens to generate. Must be greater than 0.
System prompt, can be a string or an array of media content. Used to set the model’s behavior and role.
Randomness control, 0-1. Higher values make responses more random. Recommended to set to 1.0 when using extended thinking.
Nucleus sampling parameter, 0-1, controls generation diversity. Recommended to set to 0 when using extended thinking.
Top-K sampling parameter, only supported by some models.
Whether to enable streaming output, returns SSE format data chunks. Recommended to enable when using extended thinking.
List of stop sequences. Generation stops when the model produces these sequences.
Tool definitions list, supports function tools and web search tools.
Tool selection strategy, controls how the model uses tools.
Extended thinking configuration, enables Claude’s deep reasoning capability.
Request metadata for tracking and debugging.
MCP (Model Context Protocol) server configuration.
Context management configuration, controls how conversation context is handled.
Prompt Caching
Prompt Caching allows you to cache frequently used context content, significantly reducing costs and improving response speed. Supports using thecache_control parameter in system and messages.
Cache Control Parameters
Cache control configuration, can be used in
system array elements and content array elements in messages.type: Cache type"ephemeral": 5-minute cache (default, most cost-effective)"persistent": 1-hour cache (suitable for long-term stable context)
Caching Mechanism
- Cache Position: The last content block marked with
cache_controlwill be cached - Cache Threshold: Content needs at least 1024 tokens (Claude Sonnet 4.5) or 2048 tokens (Claude 3 Haiku)
- Cache Duration:
ephemeral: Valid for 5 minutespersistent: Valid for 1 hour
- Cost Savings: Cache reads are 90% cheaper than regular inputs
Use Cases
- Long Document Analysis: Cache large documents in
system, ask multiple questions - Codebase Understanding: Cache code context for multi-turn code analysis
- Knowledge Base Q&A: Cache knowledge base content for fast queries
- Multi-turn Conversations: Cache conversation history to maintain context coherence
Basic Examples
- Non-streaming Request
- Streaming Request (SSE)
- Python Example (Anthropic SDK)
Advanced Features
System Prompt
System prompts can be set as a string or an array of media content:- String Format
- Array Format
Extended Thinking
Claude supports extended thinking, allowing the model to perform deep reasoning. When enabled, the model will think internally before generating the final answer.- Basic Usage
- Python Example
budget_tokensmust be greater than 1024- When using extended thinking, it’s recommended to set
temperature: 1.0andtop_p: 0 - Streaming output (
stream: true) must be enabled to see the thinking process
Tool Calling
Supports function tools and web search tools:- Function Tools
- Claude Official Web Search Tool
- Complete Tool Calling Flow
tool_choice Parameter Details
tool_choice controls how the model uses tools:
| Value | Description |
|---|---|
{"type": "auto"} | Automatically decide whether to use tools (default) |
{"type": "any"} | Must use at least one tool |
{"type": "none"} | Don’t use any tools |
{"type": "tool", "name": "tool_name"} | Must use the specified tool |
Multimodal Input (Images)
Supports including images in messages:Prompt Caching
Caching frequently used context content can significantly reduce costs and improve response speed.- System Cache (5 minutes)
- Messages Cache (1 hour)
- Python SDK Example
Cache Key Points:
- Content must be ≥ 1024 tokens (Claude Sonnet 4.5) to trigger caching
ephemeralcache is valid for 5 minutespersistentcache is valid for 1 hour- Cache reads cost 90% less than regular inputs
- The last block with
cache_controlwill be cached - Cache is based on exact content match; any changes invalidate the cache
Response Format
- Non-streaming Response
- Streaming Response
input_tokens: Non-cached input tokens for the current requestcache_creation_input_tokens: Tokens cached for the first time (only present in first request)cache_read_input_tokens: Tokens read from cache (present when cache hits)output_tokens: Generated output tokens
Error Handling
The system processes upstream Claude API errors and returns standardized error response formats.| Error Type | HTTP Status Code | Description |
|---|---|---|
invalid_request | 400 | Request parameter error (e.g., missing required fields) |
authentication_error | 401 | Invalid or unauthorized API key |
rate_limit_error | 429 | Request rate limit exceeded |
upstream_error | 500 | Upstream service error |
nebula_api_error | 500 | System internal error |
Comparison with /v1/chat/completions
| Feature | /v1/messages | /v1/chat/completions |
|---|---|---|
| Authentication | Authorization: Bearer | Authorization: Bearer |
| Response Format | Anthropic native format | OpenAI compatible format |
| Extended Thinking | Native thinking parameter | Via reasoning_effort or reasoning parameter |
| Tool Calling | Native tools and tool_choice | OpenAI compatible format |
| Suitable Clients | Anthropic SDK, Claude Code | OpenAI SDK, compatible clients |
Notes
max_tokensis a required parameter and must be greater than 0messagesarray cannot be empty- When using extended thinking,
budget_tokensmust be greater than 1024 - Extended thinking requires streaming output to see the thinking process
- Tool calling requires multiple rounds of interaction: first round returns tool call request, second round returns tool execution result
- Image input requires base64 encoding
Related Resources
Chat Completions (OpenAI Compatible)
View OpenAI compatible chat endpoint documentation
Model List
View all supported model information
