Gemini Native (Text)
Text Series
Gemini Native (Text)
Call LeapX API using Google Gemini native format
POST
Gemini Native (Text)
Introduction
The Gemini Native API uses Google Gemini’s request and response format. It is suitable for Google official clients (e.g. thegoogle-generativeai SDK) or when you need to work directly with Gemini data structures. The API follows the Gemini specification and supports thinking mode, multimodal input, tool calling, Google Search (Grounding), context caching, image generation, and other full capabilities.
If you use an OpenAI-compatible client (e.g. OpenAI SDK), use the
/v1/chat/completions endpoint instead.Difference from OpenAI format
| Aspect | Gemini Native | OpenAI-compatible (/v1/chat/completions) |
|---|---|---|
| Message structure | contents[].parts[] (text / inlineData / fileData) | messages[].content |
| Roles | user / model | user / assistant / system |
| System prompt | systemInstruction.parts | messages with role=system |
| Streaming | streamGenerateContent?alt=sse | stream: true |
| Thinking mode | generationConfig.thinkingConfig or model suffix | Model suffix (e.g. -thinking) |
API endpoints
| Feature | Method | Path |
|---|---|---|
| Text generation (non-streaming) | POST | /v1beta/models/{model}:generateContent |
| Text generation (streaming) | POST | /v1beta/models/{model}:streamGenerateContent?alt=sse |
| Single Embedding | POST | /v1beta/models/{model}:embedContent |
| Batch Embedding | POST | /v1beta/models/{model}:batchEmbedContents |
{model} in the path with the actual model ID, e.g. gemini-2.5-pro, gemini-3-pro-preview.
Authentication
Any of the following is supported:Bearer token:
Bearer sk-xxxxxxxxxx (recommended, consistent with other LeapX endpoints)Google-style API key:
x-goog-api-key: sk-xxxxxxxxxx?key=sk-xxxxxxxxxx.
Request parameters
generateContent / streamGenerateContent
List of conversation contents. Each item has
role (user or model) and parts. Each part can be: {"text": "..."}, {"inlineData": {"mimeType": "...", "data": "base64..."}}, or {"fileData": {"mimeType": "...", "fileUri": "gs://..."}}.Generation config.
temperature: 0–2, randomnesstopP: nucleus samplingtopK: top-K samplingmaxOutputTokens: max output tokensstopSequences: stop sequencesresponseMimeType: e.g.text/plainresponseModalities: e.g.["TEXT"]or["IMAGE"]thinkingConfig: thinking mode (see below)imageConfig: image generation config (see below)
System instruction:
{"parts": [{"text": "..."}]}.Safety levels, e.g.
[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}].Tool declarations (function calling), see advanced features.
Tool config, e.g.
functionCallingConfig.mode: AUTO / ANY / NONE.Context caching ID returned by the API; used to reuse cached context.
Response format
Non-streaminggenerateContent returns JSON:
data: and contains a JSON fragment (e.g. candidates[].content.parts).
Basic examples
- cURL (non-streaming)
- cURL (streaming)
- Python (google-generativeai)
- Node.js
By default,
google-generativeai calls Google’s API. To use LeapX, set api_endpoint to https://api.leapx-hub.com via client_options or environment variables. See your SDK docs for details.Advanced features
Thinking mode
Supported in three ways:- generationConfig.thinkingConfig (Gemini 2.5 Pro): use
thinkingBudget(token count) - thinkingConfig.thinkingLevel (Gemini 3 Pro): use
LOW/HIGH - Model suffix:
-thinking,-thinking-8192,-nothinking,-thinking-low,-thinking-high
- thinkingBudget (2.5 Pro)
- thinkingLevel (3 Pro)
Multimodal input
Mix text and media incontents[].parts:
- Image:
inlineDatawith base64data, orfileDatawithfileUri(e.g.gs://...) - Audio:
inlineDatawithmimeTypesuch asaudio/mp3
Tool calling (Function Calling)
functionCall part; include the corresponding functionResponse in the next contents and send another request.
Google Search (Grounding)
When enabled, the model can use real-time web search to improve answers (e.g. weather, news). AddgoogleSearch to tools:
googleSearch: {} and functionDeclarations as separate elements in the same tools array. Responses may include retrieval metadata (e.g. groundingMetadata).
Streaming
Use:POST /v1beta/models/{model}:streamGenerateContent?alt=sse. Request body is the same as generateContent. Response is SSE; each data: line is a JSON chunk.
Context caching
First request does not includecachedContent. If the server returns a cache ID, subsequent requests can send:
Image generation (e.g. Gemini 2.5 Flash)
When the model supports image output, set ingenerationConfig:
candidates[].content.parts may include inlineData (e.g. base64 image).
Embedding API
Single: embedContent
Endpoint:POST https://api.leapx-hub.com/v1beta/models/{model}:embedContent
Request body example:
model in the path: /v1beta/models/text-embedding-004:embedContent, with body containing only content.
Batch: batchEmbedContents
Endpoint:POST https://api.leapx-hub.com/v1beta/models/{model}:batchEmbedContents
Request body example:
Error handling
Errors are returned as HTTP status codes and JSON body, for example:| Status | Meaning |
|---|---|
| 400 | Invalid request (e.g. missing contents, unsupported parameter) |
| 401 | Authentication failed (invalid or missing API key) |
| 404 | Model not found or wrong path |
| 429 | Rate limited; retry later |
| 500 | Server error |
error.message in your client and handle retries or user messaging accordingly.
Comparison with OpenAI format
| Item | Gemini Native | OpenAI (/v1/chat/completions) |
|---|---|---|
| Base path | /v1beta/models/{model}:generateContent | /v1/chat/completions |
| Auth | Authorization: Bearer sk-xxx or x-goog-api-key | Authorization: Bearer sk-xxx |
| Message format | contents[].parts[] (text/inlineData/fileData) | messages[].content (string or array) |
| System prompt | systemInstruction.parts | messages with role: "system" |
| Streaming | streamGenerateContent?alt=sse | stream: true |
| Thinking | thinkingConfig or model suffix | Model suffix (e.g. -thinking) |
| Tools | tools[].functionDeclarations | tools[].function (OpenAI shape) |
| Typical clients | Google SDK, custom HTTP client | OpenAI SDK, OpenAI-compatible clients |
thinkingConfig, native multimodal parts). Use /v1/chat/completions when you want to stay within the OpenAI ecosystem.