Skip to main content
POST
/
v1
/
audios
/
generations
curl --request POST \
  --url https://api.foxapi.cc/v1/audios/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "paraformer-v2",
  "file_urls": [
    "https://example.com/audio/meeting.wav"
  ]
}
'
{
  "created": 1757165031,
  "id": "task-unified-1757165031-uyujaw3d",
  "model": "<string>",
  "object": "audio.generation.task",
  "progress": 0,
  "status": "pending",
  "task_info": {
    "can_cancel": true,
    "estimated_time": 45
  },
  "type": "audio"
}

Authorizations

Authorization
string
header
required

All endpoints require Bearer Token authentication

Add to request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json
model
enum<string>
default:paraformer-v2
required

Paraformer audio file recognition model

Options:

ValueDescription
paraformer-v2General version, supports multiple languages, any sample rate
paraformer-8k-v2Telephone scenario, Chinese only, 8kHz sample rate
Available options:
paraformer-v2,
paraformer-8k-v2
Example:

"paraformer-v2"

file_urls
string[]
required

Audio file URL list

Notes:

  • Supports publicly accessible URLs via HTTP/HTTPS
  • Up to 100 URLs per request
  • Supported formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
  • Single file must not exceed 2GB and 12 hours in duration
Required array length: 1 - 100 elements
Example:
["https://example.com/audio/meeting.wav"]
language_hints
string[] | null

Language hints for recognition

Notes:

  • Only supported by paraformer-v2
  • Supported language codes: zh (Chinese), en (English), ja (Japanese), yue (Cantonese), ko (Korean), de (German), fr (French), ru (Russian)
Example:
["zh", "en"]
channel_id
integer[] | null

Audio track index

Notes:

  • Index starts from 0, [0] means the first track
  • Default is [0] (only process the first track)
  • Each specified track is billed independently
Example:
[0]
recognition
object

Recognition configuration

Notes:

  • Includes disfluency filtering, timestamp alignment, hot words, and sensitive word filter settings
  • If not provided, default configuration is used
diarization
object

Speaker diarization configuration

Notes:

  • Includes diarization toggle and speaker count hint
  • If not provided, speaker diarization is not enabled

Response

Task created successfully

created
integer

Task creation timestamp

Example:

1757165031

id
string

Task ID

Example:

"task-unified-1757165031-uyujaw3d"

model
string

Actual model name used

object
enum<string>

Specific task type

Available options:
audio.generation.task
progress
integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100
Example:

0

status
enum<string>

Task status

Available options:
pending,
processing,
completed,
failed
Example:

"pending"

task_info
object

Asynchronous task info

type
enum<string>

Task output type

Available options:
audio
Example:

"audio"