For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Creates a preset (or a new version of an existing one) from an inference request body. Only fields that overlap with the preset config are persisted; other fields (e.g. messages, stream, prompt) are silently ignored.
Authentication
AuthorizationBearer
API key as bearer token in Authorization header
Path parameters
slugstringRequired>=1 character
URL-safe slug identifying the preset. Created if it does not exist.
Request
This endpoint expects an object.
backgroundboolean or nullOptional
cache_controlobjectOptional
Enable automatic prompt caching. When set at the top level, the system automatically applies cache breakpoints to the last cacheable block in the request. Currently supported for Anthropic Claude models.
frequency_penaltydouble or nullOptional
image_configstring or double or list of anyOptional
includelist of enums or nullOptional
Allowed values:
inputstring or list of objectsOptional
Input for a response request - can be a string or array of items
instructionsstring or nullOptional
max_output_tokensinteger or nullOptional
max_tool_callsinteger or nullOptional
metadatamap from strings to stringsOptional
Metadata key-value pairs for the request. Keys must be ≤64 characters and cannot contain brackets. Values must be ≤512 characters. Maximum 16 pairs allowed.
modalitieslist of enumsOptional
Output modalities for the response. Supported values are "text" and "image".
Allowed values:
modelstringOptional
modelslist of stringsOptional
parallel_tool_callsboolean or nullOptional
pluginslist of objectsOptional
Plugins you want to enable for this request, including their settings.
presence_penaltydouble or nullOptional
previous_response_idstring or nullOptional
promptobjectOptional
prompt_cache_keystring or nullOptional
providerobjectOptional
When multiple model providers are available, optionally indicate your routing preference.
reasoningobjectOptional
Configuration for reasoning mode in the response
routeanyOptional
safety_identifierstring or nullOptional
service_tierenum or nullOptionalDefaults to auto
Allowed values:
session_idstringOptional<=256 characters
A unique identifier for grouping related requests (e.g., a conversation or agent workflow) for observability. If provided in both the request body and the x-session-id header, the body value takes precedence. Maximum of 256 characters.
stop_server_tools_whenlist of objectsOptional
Stop conditions for the server-tool agent loop. Any condition firing halts the loop (OR logic). When set, this overrides max_tool_calls.
storefalseOptional
streambooleanOptionalDefaults to false
temperaturedouble or nullOptional
textobjectOptional
Text output configuration including format and verbosity
tool_choiceenum or objectOptional
toolslist of objectsOptional
top_kintegerOptional
top_logprobsinteger or nullOptional
top_pdouble or nullOptional
traceobjectOptional
Metadata for observability and tracing. Known keys (trace_id, trace_name, span_name, generation_name, parent_span_id) have special handling. Additional keys are passed through as custom metadata to configured broadcast destinations.
truncationenumOptional
Allowed values:
userstringOptional<=256 characters
A unique identifier representing your end-user, which helps distinguish between different users of your app. This allows your app to identify specific users in case of abuse reports, preventing your entire app from being affected by the actions of individual users. Maximum of 256 characters.