Scribe V2 Speech Recognition

curl --request POST \ --url https://api.foxapi.cc/v1/audios/generations \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model": "scribe-v2", "audio_url": "https://samplelib.com/lib/preview/mp3/sample-3s.mp3" } '

{ "created": 1757165031, "id": "task-unified-1757165031-uyujaw3d", "model": "<string>", "object": "audio.generation.task", "progress": 0, "status": "pending", "task_info": { "can_cancel": true, "estimated_time": 45 }, "type": "audio" }

Authorizations

Authorization

string

header

required

All APIs require Bearer Token authentication

Add to request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

string

default:scribe-v2

required

scribe-v2: Speech recognition model supporting diarize, audio event tagging, and keyterms

Example:

"scribe-v2"

audio_url

string

required

Audio file URL to transcribe

Notes:

Must be an HTTP/HTTPS accessible URL
The audio file must be directly accessible and readable by the system

Example:

"https://samplelib.com/lib/preview/mp3/sample-3s.mp3"

language_code

string | null

Audio language code

Notes:

Supports ISO-639-1 or ISO-639-3 codes
Examples: zh / zho / en / eng
Auto-detected if not provided

Example:

"zh"

tag_audio_events

boolean

default:true

Whether to tag audio events such as laughter and applause. Enabled by default.

Example:

true

diarize

boolean

default:true

Whether to perform speaker diarization. Enabled by default.

Example:

true

keyterms

string[] | null

Bias terms / phrase list

Notes:

Up to 100 entries
Each entry up to 50 characters
Used to boost recognition of specific terms or proper nouns

Do not pass this parameter unless necessary.

Maximum array length: 100

Maximum string length: 50

Example:

[
  "project kickoff",
  "quarterly results",
  "speech to text"
]

Response

Task created successfully

created

integer

Task creation timestamp

Example:

1757165031

string

Task ID

Example:

"task-unified-1757165031-uyujaw3d"

model

string

Actual model name used

object

enum<string>

Specific task type

Available options:

audio.generation.task

progress

integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100

Example:

0

status

enum<string>

Task status

Available options:

pending,

processing,

completed,

failed

Example:

"pending"

task_info

object

Asynchronous task info

Show child attributes

type

enum<string>

Task output type

Available options:

audio

Example:

"audio"

Image Series

Video Series

Audio Series

Text Series

Task Management

File Management

Scribe V2 Speech Recognition

Authorizations

All APIs require Bearer Token authentication

Body

Response

Image Series

Video Series

Audio Series

Text Series

Task Management

File Management

Documentation Index

Authorizations

All APIs require Bearer Token authentication

Body

Response