歌曲生成

Song Generation

给一句主题，自动写词、合成人声，产出一首完整曲子。约 2-4 分钟，先看到歌词，再拿到 flac / wav / mp3 文件。

Give a theme; the service writes the lyrics, synthesises the vocals, and produces a finished song. Around 2-4 minutes — lyrics first, then the flac / wav / mp3.

POST /v1/song/create GET /v1/song/status 稳定Stable v1.0 异步 · 推荐Async · Recommended

输入 · 主题 + 模式Input · theme + mode 输出 · 歌词 + 音频文件Output · lyrics + audio file

主题THEME

"周末骑单车去看海风把白衬衫吹起来"

"Weekend cycling to the sea, wind in the open shirt"

🎤 写词 + 演唱Sing 阳光抒情 sunny pop 210 s

POST

/v1/song/create

写词 → 演唱

lyrics → audio

输出OUTPUT

[Verse 1] 周末的风吹起白衬衫
阳光铺满那条海岸线 The weekend wind lifts my open shirt
Sunlight spills along the coast

♪ song.flac 3:42 · 48 kHz

限制与约束

Limits & Constraints

audio_duration 范围

Audio duration

10 ~ 600 s

inference_steps

Inference steps

1 ~ 200 (rec 64)

batch_size

Batch size

1 ~ 8 (rec 1)

audio_format

Audio format

flac / wav / wav32 / mp3

作业队列上限

Job queue size

200

推荐韵脚

Rhymes

-ai -ang -ou -ing -an

性能指标

Performance

实测 A800 80GB · 作词模型 + 音频生成模型

Measured on A800 80GB · lyricist + audio model

POST 返回

POST response

< 1 s

Lyrics 阶段

Lyrics phase

90 ~ 150 s

Audio 阶段

Audio phase

15 ~ 40 s

总时长

Total wall time

~ 2-4 min

GET status

< 100 ms

公网 LB 风险

LB cutoff risk

✓ 0

在线体验

Try It

填好下面的参数，点击「生成歌曲」，等 2-4 分钟，flac / mp3 文件会自动出现在右侧。

Fill in the form and hit Generate; an flac/mp3 will appear in 2-4 minutes.

请求参数

主题 *Theme *

用一句话描述要写的歌（中英任意），细节越具体越好。

One-line description; more concrete = better lyrics.

主推乐器（可选）Featured Instruments (optional)

选中的乐器会作为 hint 拼接到 caption，告诉音频模型要突出的乐器。

Selected tracks are appended to the caption as a hint to the audio model.

鼓drums 贝斯bass 吉他guitar 键盘keyboard 弦乐strings 合成器synth 铜管brass 木管woodwinds 打击乐percussion 音效fx 主唱vocals 和声backing_vocals

语言Language

韵脚Rhyme

声部Vocal Gender

男声male 女声female 不限any

期望时长（秒）Target Duration (s)

220 s

风格描述（可选）Style hint (optional)

额外指令（可选）Additional Instructions

高级 · 歌词生成参数Advanced · Lyric Generation

Temperature — 采样温度（低稳定／高多样） Temperature — low = focused, high = diverse

0.85

智能扩写主题（短主题先扩成具体场景再写词） Smart theme expansion (sparse themes get rewritten into a concrete scene first)

主题超过 60 字会自动跳过；扩写过程会多调一次 LLM(~1-3 s)，但成品质量明显提升。

Skipped automatically when the theme is already > 60 chars. Adds one extra LLM call (~1-3 s) but lifts output quality noticeably.

押韵严格模式（韵脚不达标自动重试 1 次） Strict rhyme (retry once if rhyme check fails)

同时生成封面图片 Generate album cover

外部接口生成，与音频并行，不占本地显存。

External API, runs in parallel with audio (no local VRAM).

封面比例Cover ratio

高级 · 音频生成参数Advanced · Audio Generation

三项均可留空 → 由作词模型给出推荐值。

All three may be blank → the lyricist picks them.

BPM

每分钟节拍。慢歌 60-80 / 流行 90-120 / 舞曲 120-140。调大 → 更快更紧凑。

Beats per minute. Ballad 60-80 / pop 90-120 / dance 120-140. Higher = faster, tighter.

调性Key

如 C major / B minor。小调 = 忧伤、大调 = 明亮。

e.g. C major / B minor. Minor = sad, major = bright.

音频时长 (s)Audio Duration (s)

秒数 10-600；常见 180-240。调大 → 更长，但生成时间线性增加。

Seconds 10-600; typical 180-240. Longer = linearly slower to generate.

格式Format

flac/wav 无损（大）· mp3 有损但小 · wav32 32-bit 浮点，留给后期混音。

flac/wav lossless (large) · mp3 lossy + small · wav32 float32 for mastering.

推理步数Inf steps

DiT 去噪步数。64 通用；调高细节略增、耗时线性涨；<30 差别几乎听不出。

DiT denoising steps. 64 default; higher = slightly cleaner, linearly slower; <30 barely audible.

批量Batch

同一提示词生成几条候选，挑最好的。每 +1 ≈ 多 2-3 GB 显存 + 等比例延时。

Variants per prompt — pick the best. +1 ≈ +2-3 GB VRAM + proportional latency.

Guidance Scale (CFG)

DiT 无分类器引导强度。1-3 自由发挥但偏离主题；7.0 通用平衡；10+ 严格贴 prompt 但容易出现噪点 / 数字伪影。

DiT classifier-free guidance. 1-3 creative but drifts; 7.0 balanced default; 10+ sticks to prompt but introduces noise / digital artifacts.

7.0

use_adg — Adaptive Dynamic Guidance（动态调整 CFG） use_adg — Adaptive Dynamic Guidance

让 CFG 在去噪过程中自动从高→低退火：开头严格遵循 prompt，结尾交还细节。开 → 高 CFG 时伪影更少；推理步数 ≥ 50 时收益最明显。

Anneals CFG from high → low across denoising: prompt-faithful early, detail-rich late. On → fewer artifacts at high CFG; biggest payoff when steps ≥ 50.

进度 / 结果

点击「生成歌曲」开始…

Click "Generate Full Song" to start…

请求与响应

Request & Response

提交一首歌的生成任务。请求立即返回 intent_id（< 1 s），之后用 /v1/song/status 轮询状态拿歌词和音频 URL。

Submit a song-generation job. Returns intent_id immediately (< 1 s); poll /v1/song/status to track lyrics + audio.

请求 body

Request body

字段	Field	类型	Type
theme	string	必填required	主题Theme
language	string	"zh"	"zh" 或 "en" 等 ISO 639 代码ISO 639 code: "zh", "en", ...
rhyme	string\|null	null	韵脚Rhyme suffix
style	string\|null	null	风格描述Style hint
vocal_gender	enum	"male"	`male` \| `female` \| `any`
duration_hint	float	220.0	期望时长（10-600 秒）Target duration (10-600 s)
additional_instructions	string\|null	null	额外指令Extra instructions
temperature	float	0.85	采样温度Sampling temperature
vocal_mode	enum	"sing"	`sing` / `instrumental_from_lyrics` / `instrumental_quick`
track_classes	string[]	null	主推乐器（drums/bass/guitar/...）Featured instruments (drums/bass/guitar/...)
bpm	int\|null	null	手动覆盖 LM 给的 BPMOverride LM-suggested BPM
key_scale	string\|null	null	手动覆盖调性Override key
duration	float\|null	null	手动覆盖时长Override duration
audio_format	string	"flac"	flac / wav / wav32 / mp3
inference_steps	int	64	DiT 推理步数DiT inference steps
guidance_scale	float	7.0	CFG scale
use_adg	bool	false	Adaptive Dynamic Guidance
batch_size	int	1	每次生成几首How many songs to generate
generate_cover	bool	true	是否同步生成封面图（与音频并行，不占本地显存）Generate album cover in parallel with audio (no local VRAM)
cover_size	string\|null	null	封面比例 `1:1` / `3:4` / `9:16` / `16:9` 等，留空走部署默认Aspect: `1:1` / `3:4` / `9:16` / `16:9`; null = deployment default

调用示例

Code Examples

响应

Response

200 OK · application/jsonSongCreateResponse

{
  "code": 200,
  "data": {
    "intent_id": "abc-...-uuid",
    "status":    "lyrics_pending",
    "next_step": {
      "method":            "GET",
      "url":               "/v1/song/status?intent_id=abc-...",
      "poll_interval_sec": 5
    }
  },
  "error": null
}

轮询接口（建议每 5 秒一次）。每次返回当前状态、已就绪的歌词、最终的音频 URL。

Polling endpoint (recommended every 5 s). Returns current status, lyrics_meta once ready, and audio_url when complete.

查询参数

Query parameters

字段	Field	类型	Type	要求	Required	说明	Description
intent_id	string	必填required	`/v1/song/create` 返回的 UUIDUUID returned by `/v1/song/create`

status 字段取值

Status values

status	含义	Meaning
lyrics_pending	作词模型正在写词（最长 ~150 s）Lyricist writing lyrics (~150 s)
audio_pending	歌词已就绪，音频任务已入队Lyrics done, audio queued
audio_running	DiT 正在生成音频DiT generating audio
completed	完成 — `audio.audio_url` 可下载Done — `audio.audio_url` ready
failed	失败 — `error` 字段含原因Failed — see `error` field

调用示例

Code Examples

响应（completed 时）

Response (when completed)

200 OK · application/jsonSongStatusResponse

{
  "intent_id":    "abc-...-uuid",
  "status":       "completed",
  "submitted_at": 1715170800.12,
  "completed_at": 1715171064.55,
  "lyrics_meta": { /* title / lyrics_zh / caption / bpm / key_scale / ... */ },
  "audio": {
    "store_status": "completed",
    "file_paths":   ["/path/to/song.flac"],
    "audio_url":    "/v1/audio?path=..."
  },
  "error": null
}

下载 status.audio.audio_url 指向的音频文件。返回二进制流，Content-Type 按格式而定。

Download the audio file at status.audio.audio_url. Returns a binary stream; Content-Type depends on format.

查询参数

Query parameters

字段	Field	类型	Type	要求	Required	说明	Description
path	string	必填required	URL-encoded 音频路径（来自 `status.audio.file_paths`）URL-encoded audio path (from `status.audio.file_paths`)

调用示例

Code Examples

响应

Response

200 OK · audio/* (binary)AudioStream

// 二进制音频流，Content-Type 例如 audio/flac、audio/wav、audio/mpeg。
// Binary audio stream — Content-Type is audio/flac, audio/wav, audio/mpeg, ...

错误码

Error Codes

401

未登录 / token 无效

Unauthorized

先调 POST /auth/login 拿 token。

Call POST /auth/login first.

404

intent_id 不存在

Unknown intent_id

intent_id 拼写错误，或服务端重启过（intent 是内存态）。

Misspelled, or the server restarted (intent state is in-memory).

422

参数校验失败

Validation error

检查 theme 非空、duration_hint 在 (10, 600] 等。

Verify theme non-empty, duration_hint in (10, 600], etc.

429

作业队列已满

Job queue full

稍后重试；并发上限 200。

Retry later; max 200 concurrent jobs.

503

Lyricist 服务未启用

Lyricist service not enabled

部署时设 ACESTEP_ENABLE_LYRICIST=true。

Set ACESTEP_ENABLE_LYRICIST=true on the server.

AI 集成 — 一键复制提示词

AI Integration — Copy Prompt

把下面这段经过实测的提示词复制给 Claude / Cursor / GPT，让它直接生成调用本接口的代码——包含异步轮询、错误处理、示例输入输出。

Paste the battle-tested prompt below into Claude, Cursor, or GPT and let it generate code that calls this API — covering async polling, error handling, and sample I/O.

AI-READY PROMPT

~ tokens · 已为 Claude 4.7 / GPT-5 优化tokens · optimized for Claude 4.7 & GPT-5

用 AI 快速集成

Integrate fast with AI

已在主流编程 Agent 上跑过验证。涵盖 API 结构、认证、错误处理和示例输入/输出。粘贴后说一句 "用我的技术栈实现这个" 即可。

Tested on mainstream coding agents. Includes API shape, auth, error handling, and sample I/O. Paste it and say "build this in my stack".

歌曲生成

Song Generation

限制与约束

Limits & Constraints

性能指标

Performance

在线体验

Try It

请求与响应

Request & Response

请求 body

Request body

调用示例

Code Examples

响应

Response

查询参数

Query parameters

status 字段取值

Status values

调用示例

Code Examples

响应（completed 时）

Response (when completed)

查询参数

Query parameters

调用示例

Code Examples

响应

Response

错误码

Error Codes

AI 集成 — 一键复制提示词

AI Integration — Copy Prompt

用 AI 快速集成

Integrate fast with AI

示例

Samples