Video Understanding

Authorizations

Authorization

string

header

required

All endpoints require Bearer Token authentication. Add to the request header:

Authorization: Bearer YOUR_API_KEY

YOUR_API_KEY is the API Token (sk-... format).

Body

application/json

model

string

default:gemini-2.5-pro

required

Model name. Common video models:

gemini-2.5-pro (recommended)
nemotron-3-nano-omni (single video only)

Examples:

"gemini-2.5-pro"

"nemotron-3-nano-omni"

prompt

string

required

User prompt, up to 100,000 characters.

Maximum string length: 100000

Example:

"What is happening in this video?"

video_urls

string[]

required

Array of video sources (1–10). Each element accepts one of the following two forms:

Publicly reachable HTTP/HTTPS URL
data:video/<type>;base64,<payload> data URI (base64 inline; note that video payloads are large)

Model constraints:

gemini-2.5-pro: supports multiple videos
nemotron-3-nano-omni: single video only; video_urls.length > 1 → 422
Other LLM models do not support video

Cost note: video is encoded by frame + time; a 30s clip may consume 20K+ tokens. Prefer short clips or low-frame-rate sources.

Required array length: 1 - 10 elements

Example:

[
  "https://storage.googleapis.com/cloud-samples-data/video/animals.mp4"
]

sync

boolean

default:false

Synchronous mode (see llm-text schema).

Example:

false

stream

boolean

default:false

Whether to stream (see llm-text schema).

Example:

false

max_tokens

integer | null

Generation token limit. Optional.

Required range: x >= 1

Example:

128

temperature

number | null

Sampling temperature, range [0, 2]. Optional.

Required range: 0 <= x <= 2

system_prompt

string | null

System instruction. Optional.

Maximum string length: 10000

reasoning

boolean | null

Whether to include reasoning tokens. Thinking models like gemini-2.5-pro may require this to be set to true.

Response

Task created (async mode) / full response (sync mode)

Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails. In sync=true, stream=false mode, the endpoint directly returns the full OpenAI ChatCompletion JSON.

string

required

Task ID, formatted as task-llmrouter-{timestamp}-{8random}.

Example:

"task-llmrouter-1776874565-yq3szvcu"

object

enum<string>

required

Available options:

llm.generation.task

Example:

"llm.generation.task"

type

enum<string>

required

Available options:

llm

Example:

"llm"

model

string

required

The model name submitted by the client (echoed verbatim)

Example:

"gemini-2.5-pro"

status

enum<string>

required

Available options:

pending

Example:

"pending"

progress

integer

required

Example:

0

created

integer

required

Example:

1776874565

stream

object

Returns {url: ...} when stream=true; null when stream=false.

Show child attributes

results

object[] | null

Fixed at null during submit; returned via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response.

Example:

null

error

object

Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails.

Example:

null

Image Series

Video Series

Audio Series

Text Series

Task Management

File Management

Video Understanding

Authorizations

Body

Response

Image Series

Video Series

Audio Series

Text Series

Task Management

File Management

Documentation Index

Authorizations

Body

Response