Skip to main content
POST
/
v1
/
llm
/
generations
curl --request POST \
  --url https://api.foxapi.cc/v1/llm/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini-2.5-pro",
  "prompt": "What is happening in this video?",
  "video_urls": [
    "https://storage.googleapis.com/cloud-samples-data/video/animals.mp4"
  ],
  "max_tokens": 128
}
'
{
  "id": "task-llmrouter-1776874481-rj6bs3yb",
  "object": "llm.generation.task",
  "type": "llm",
  "model": "gemini-2.5-pro",
  "status": "pending",
  "progress": 0,
  "created": 1776874481,
  "stream": null,
  "results": null,
  "error": null
}

Documentation Index

Fetch the complete documentation index at: https://docs.foxapi.cc/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

All endpoints require Bearer Token authentication. Add to the request header:

Authorization: Bearer YOUR_API_KEY

YOUR_API_KEY is the API Token (sk-... format).

Body

application/json
model
string
default:gemini-2.5-pro
required

Model name. Common video models:

  • gemini-2.5-pro (recommended)
  • nemotron-3-nano-omni (single video only)
Examples:

"gemini-2.5-pro"

"nemotron-3-nano-omni"

prompt
string
required

User prompt, up to 100,000 characters.

Maximum string length: 100000
Example:

"What is happening in this video?"

video_urls
string[]
required

Array of video sources (1–10). Each element accepts one of the following two forms:

  • Publicly reachable HTTP/HTTPS URL
  • data:video/<type>;base64,<payload> data URI (base64 inline; note that video payloads are large)

Model constraints:

  • gemini-2.5-pro: supports multiple videos
  • nemotron-3-nano-omni: single video only; video_urls.length > 1 → 422
  • Other LLM models do not support video

Cost note: video is encoded by frame + time; a 30s clip may consume 20K+ tokens. Prefer short clips or low-frame-rate sources.

Required array length: 1 - 10 elements
Example:
[
"https://storage.googleapis.com/cloud-samples-data/video/animals.mp4"
]
sync
boolean
default:false

Synchronous mode (see llm-text schema).

Example:

false

stream
boolean
default:false

Whether to stream (see llm-text schema).

Example:

false

max_tokens
integer | null

Generation token limit. Optional.

Required range: x >= 1
Example:

128

temperature
number | null

Sampling temperature, range [0, 2]. Optional.

Required range: 0 <= x <= 2
system_prompt
string | null

System instruction. Optional.

Maximum string length: 10000
reasoning
boolean | null

Whether to include reasoning tokens. Thinking models like gemini-2.5-pro may require this to be set to true.

Response

Task created (async mode) / full response (sync mode)

Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails. In sync=true, stream=false mode, the endpoint directly returns the full OpenAI ChatCompletion JSON.

id
string
required

Task ID, formatted as task-llmrouter-{timestamp}-{8random}.

Example:

"task-llmrouter-1776874565-yq3szvcu"

object
enum<string>
required
Available options:
llm.generation.task
Example:

"llm.generation.task"

type
enum<string>
required
Available options:
llm
Example:

"llm"

model
string
required

The model name submitted by the client (echoed verbatim)

Example:

"gemini-2.5-pro"

status
enum<string>
required
Available options:
pending
Example:

"pending"

progress
integer
required
Example:

0

created
integer
required
Example:

1776874565

stream
object

Returns {url: ...} when stream=true; null when stream=false.

results
object[] | null

Fixed at null during submit; returned via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response.

Example:

null

error
object

Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails.

Example:

null