PublicEndpoint - Python SDK

PublicEndpoint method reference

The Python SDK and docs are currently in beta. Report issues on GitHub.

Information about a specific model endpoint

Fields

FieldTypeRequiredDescriptionExample
context_lengthint✔️N/A
latency_last_30mNullable[components.PercentileStats]✔️Latency percentiles in milliseconds over the last 30 minutes. Latency measures time to first token. Only visible when authenticated with an API key or cookie; returns null for unauthenticated requests.{"p50": 25.5,"p75": 35.2,"p90": 48.7,"p99": 85.3}
max_completion_tokensNullable[int]✔️N/A
max_prompt_tokensNullable[int]✔️N/A
model_idstr✔️The unique identifier for the model (permaslug)openai/gpt-4
model_namestr✔️N/A
namestr✔️N/A
pricingcomponents.Pricing✔️N/A
provider_namecomponents.ProviderName✔️N/AOpenAI
quantizationNullable[components.PublicEndpointQuantization]✔️N/Afp16
statusOptional[components.EndpointStatus]N/A0
supported_parametersList[components.Parameter]✔️N/A
supports_implicit_cachingbool✔️N/A
tagstr✔️N/A
throughput_last_30mNullable[components.PercentileStats]✔️N/A{"p50": 25.5,"p75": 35.2,"p90": 48.7,"p99": 85.3}
uptime_last_1dNullable[float]✔️Uptime percentage over the last 1 day, calculated as successful requests / (successful + error requests) * 100. Rate-limited requests are excluded. Returns null if insufficient data.
uptime_last_30mNullable[float]✔️N/A
uptime_last_5mNullable[float]✔️Uptime percentage over the last 5 minutes, calculated as successful requests / (successful + error requests) * 100. Rate-limited requests are excluded. Returns null if insufficient data.