歌曲生成

Song Generation

给一句主题,自动写词、合成人声,产出一首完整曲子。 约 2-4 分钟,先看到歌词,再拿到 flac / wav / mp3 文件。

Give a theme; the service writes the lyrics, synthesises the vocals, and produces a finished song. Around 2-4 minutes — lyrics first, then the flac / wav / mp3.

POST /v1/song/create GET /v1/song/status 稳定Stable v1.0 异步 · 推荐Async · Recommended
输入 · 主题 + 模式Input · theme + mode 输出 · 歌词 + 音频文件Output · lyrics + audio file
主题THEME
"周末骑单车去看海 风把白衬衫吹起来"
"Weekend cycling to the sea, wind in the open shirt"
🎤 写词 + 演唱Sing 阳光抒情 sunny pop 210 s
POST
/v1/song/create
写词演唱
lyricsaudio
输出OUTPUT
[Verse 1] 周末的风 吹起白衬衫
阳光铺满那条海岸线
The weekend wind lifts my open shirt
Sunlight spills along the coast
♪ song.flac 3:42 · 48 kHz
01

限制与约束

Limits & Constraints

audio_duration 范围
Audio duration
10 ~ 600 s
inference_steps
Inference steps
1 ~ 200 (rec 64)
batch_size
Batch size
1 ~ 8 (rec 1)
audio_format
Audio format
flac / wav / wav32 / mp3
作业队列上限
Job queue size
200
推荐韵脚
Rhymes
-ai -ang -ou -ing -an
02

性能指标

Performance

实测 A800 80GB · 作词模型 + 音频生成模型

Measured on A800 80GB · lyricist + audio model

POST 返回
POST response
< 1 s
Lyrics 阶段
Lyrics phase
90 ~ 150 s
Audio 阶段
Audio phase
15 ~ 40 s
总时长
Total wall time
~ 2-4 min
GET status
GET status
< 100 ms
公网 LB 风险
LB cutoff risk
✓ 0
03

在线体验

Try It

填好下面的参数,点击「生成歌曲」,等 2-4 分钟,flac / mp3 文件会自动出现在右侧。

Fill in the form and hit Generate; an flac/mp3 will appear in 2-4 minutes.

请求参数
用一句话描述要写的歌(中英任意),细节越具体越好。
One-line description; more concrete = better lyrics.
选中的乐器会作为 hint 拼接到 caption,告诉音频模型要突出的乐器。
Selected tracks are appended to the caption as a hint to the audio model.
drums 贝斯bass 吉他guitar 键盘keyboard 弦乐strings 合成器synth 铜管brass 木管woodwinds 打击乐percussion 音效fx 主唱vocals 和声backing_vocals
220 s
高级 · 歌词生成参数Advanced · Lyric Generation
0.85
主题超过 60 字会自动跳过;扩写过程会多调一次 LLM(~1-3 s),但成品质量明显提升。
Skipped automatically when the theme is already > 60 chars. Adds one extra LLM call (~1-3 s) but lifts output quality noticeably.
外部接口生成,与音频并行,不占本地显存。
External API, runs in parallel with audio (no local VRAM).
高级 · 音频生成参数Advanced · Audio Generation
三项均可留空 → 由作词模型给出推荐值。
All three may be blank → the lyricist picks them.
每分钟节拍。慢歌 60-80 / 流行 90-120 / 舞曲 120-140。调大 → 更快更紧凑。
Beats per minute. Ballad 60-80 / pop 90-120 / dance 120-140. Higher = faster, tighter.
C major / B minor。小调 = 忧伤、大调 = 明亮。
e.g. C major / B minor. Minor = sad, major = bright.
秒数 10-600;常见 180-240。调大 → 更长,但生成时间线性增加。
Seconds 10-600; typical 180-240. Longer = linearly slower to generate.
flac/wav 无损(大)· mp3 有损但小 · wav32 32-bit 浮点,留给后期混音。
flac/wav lossless (large) · mp3 lossy + small · wav32 float32 for mastering.
DiT 去噪步数。64 通用;调高细节略增、耗时线性涨;<30 差别几乎听不出。
DiT denoising steps. 64 default; higher = slightly cleaner, linearly slower; <30 barely audible.
同一提示词生成几条候选,挑最好的。每 +1 ≈ 多 2-3 GB 显存 + 等比例延时。
Variants per prompt — pick the best. +1 ≈ +2-3 GB VRAM + proportional latency.
DiT 无分类器引导强度。1-3 自由发挥但偏离主题;7.0 通用平衡;10+ 严格贴 prompt 但容易出现噪点 / 数字伪影。
DiT classifier-free guidance. 1-3 creative but drifts; 7.0 balanced default; 10+ sticks to prompt but introduces noise / digital artifacts.
7.0
让 CFG 在去噪过程中自动从高→低退火:开头严格遵循 prompt,结尾交还细节。开 → 高 CFG 时伪影更少;推理步数 ≥ 50 时收益最明显。
Anneals CFG from high → low across denoising: prompt-faithful early, detail-rich late. On → fewer artifacts at high CFG; biggest payoff when steps ≥ 50.
进度 / 结果
点击「生成歌曲」开始…
Click "Generate Full Song" to start…
04

请求与响应

Request & Response

提交一首歌的生成任务。请求立即返回 intent_id(< 1 s),之后用 /v1/song/status 轮询状态拿歌词和音频 URL。

Submit a song-generation job. Returns intent_id immediately (< 1 s); poll /v1/song/status to track lyrics + audio.

请求 body

Request body

字段Field 类型Type 默认Default 说明Description
themestring必填required主题Theme
languagestring"zh""zh" 或 "en" 等 ISO 639 代码ISO 639 code: "zh", "en", ...
rhymestring|nullnull韵脚Rhyme suffix
stylestring|nullnull风格描述Style hint
vocal_genderenum"male"male | female | any
duration_hintfloat220.0期望时长(10-600 秒)Target duration (10-600 s)
additional_instructionsstring|nullnull额外指令Extra instructions
temperaturefloat0.85采样温度Sampling temperature
vocal_modeenum"sing"sing / instrumental_from_lyrics / instrumental_quick
track_classesstring[]null主推乐器(drums/bass/guitar/...)Featured instruments (drums/bass/guitar/...)
bpmint|nullnull手动覆盖 LM 给的 BPMOverride LM-suggested BPM
key_scalestring|nullnull手动覆盖调性Override key
durationfloat|nullnull手动覆盖时长Override duration
audio_formatstring"flac"flac / wav / wav32 / mp3
inference_stepsint64DiT 推理步数DiT inference steps
guidance_scalefloat7.0CFG scale
use_adgboolfalseAdaptive Dynamic Guidance
batch_sizeint1每次生成几首How many songs to generate
generate_coverbooltrue是否同步生成封面图(与音频并行,不占本地显存)Generate album cover in parallel with audio (no local VRAM)
cover_sizestring|nullnull封面比例 1:1 / 3:4 / 9:16 / 16:9 等,留空走部署默认Aspect: 1:1 / 3:4 / 9:16 / 16:9; null = deployment default

调用示例

Code Examples


            

响应

Response

200 OK · application/jsonSongCreateResponse
{
  "code": 200,
  "data": {
    "intent_id": "abc-...-uuid",
    "status":    "lyrics_pending",
    "next_step": {
      "method":            "GET",
      "url":               "/v1/song/status?intent_id=abc-...",
      "poll_interval_sec": 5
    }
  },
  "error": null
}
05

错误码

Error Codes

401
未登录 / token 无效
Unauthorized
先调 POST /auth/login 拿 token。
Call POST /auth/login first.
404
intent_id 不存在
Unknown intent_id
intent_id 拼写错误,或服务端重启过(intent 是内存态)。
Misspelled, or the server restarted (intent state is in-memory).
422
参数校验失败
Validation error
检查 theme 非空、duration_hint 在 (10, 600] 等。
Verify theme non-empty, duration_hint in (10, 600], etc.
429
作业队列已满
Job queue full
稍后重试;并发上限 200。
Retry later; max 200 concurrent jobs.
503
Lyricist 服务未启用
Lyricist service not enabled
部署时设 ACESTEP_ENABLE_LYRICIST=true
Set ACESTEP_ENABLE_LYRICIST=true on the server.
06

AI 集成 — 一键复制提示词

AI Integration — Copy Prompt

把下面这段经过实测的提示词复制给 Claude / Cursor / GPT,让它直接生成调用本接口的代码——包含异步轮询、错误处理、示例输入输出。

Paste the battle-tested prompt below into Claude, Cursor, or GPT and let it generate code that calls this API — covering async polling, error handling, and sample I/O.

AI-READY PROMPT
~ tokens · 已为 Claude 4.7 / GPT-5 优化tokens · optimized for Claude 4.7 & GPT-5

AI 快速集成

Integrate fast with AI

已在主流编程 Agent 上跑过验证。涵盖 API 结构、认证、错误处理和示例输入/输出。粘贴后说一句 "用我的技术栈实现这个" 即可。

Tested on mainstream coding agents. Includes API shape, auth, error handling, and sample I/O. Paste it and say "build this in my stack".


        
07

示例

Samples