WAN video generation
WAN is Alibaba's open video-generation model family. The orchestrator exposes every shipped version, across multiple providers, under a single videoGen step. This recipe walks through the full surface: which version to pick, which provider to route to, and how to invoke each operation.
Versions at a glance
version | Providers | Operations | Notes |
|---|---|---|---|
v2.7 | fal | text-to-video, image-to-video, reference-to-video, edit-video | Current flagship on FAL. Adds edit-video. |
v2.6 | fal | text-to-video, image-to-video, reference-to-video | FAL production default for new integrations. |
v2.5 | fal | text-to-video, image-to-video | Still supported; fewer operations than 2.6/2.7. |
v2.2 | fal, comfy | text-to-video, image-to-video | Only version with a native ComfyUI path. Supports LoRAs + Turbo mode. |
v2.1 | fal, civitai | text-to-video, image-to-video | Legacy — prefer 2.6+ unless you specifically need Civitai-hosted inference. |
Default choice for new integrations: version: "v2.6", provider: "fal".
The request shape
Every WAN request is a single videoGen step on SubmitWorkflow. Four keys select which WAN variant runs:
{
"$type": "videoGen",
"input": {
"engine": "wan",
"version": "v2.6", // 2.1 | 2.2 | 2.5 | 2.6 | 2.7
"provider": "fal", // fal | comfy | civitai (version-dependent)
"operation": "text-to-video" // see table above
}
}The orchestrator dispatches to the matching input schema (Wan26FalTextToVideoInput, Wan22ComfyVideoGenInput, etc.), so only the fields valid for that combination are accepted — SubmitWorkflow will 400 on unknown ones.
Operations
All examples target production and use <your-token> in place of your Bearer token. Request timeout is 100 s — wait is capped accordingly. See Results & webhooks for anything longer.
text-to-video
Prompt → video. The most common operation; supported on every WAN version.
POST https://orchestration.civitai.com/v2/consumer/workflows?whatif=false&wait=60
Authorization: Bearer <your-token>
Content-Type: application/json
{
"steps": [{
"$type": "videoGen",
"input": {
"engine": "wan",
"version": "v2.6",
"provider": "fal",
"operation": "text-to-video",
"prompt": "A serene forest with sunlight filtering through the trees, cinematic quality",
"resolution": "1080p",
"aspectRatio": "16:9",
"duration": 5,
"enablePromptExpansion": true
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
image-to-video
One or more source images animate into a clip. Supported on every version.
{
"engine": "wan",
"version": "v2.6",
"provider": "fal",
"operation": "image-to-video",
"images": [
"https://image.civitai.com/.../19325406.jpeg"
],
"prompt": "A dancing cat moving gracefully",
"resolution": "1080p",
"duration": 5
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
v2.7 image-to-video uses startImage + endImage (not images[]). Pass startImage to seed the first frame and optionally endImage to constrain the last frame (useful for loops and transitions). The images[] array accepted by v2.6 is not available on v2.7.
reference-to-video (v2.6, v2.7)
Pass one or more reference videos; refer to them from the prompt via @Video1, @Video2, @Video3 to transfer subjects / motion / style.
{
"engine": "wan",
"version": "v2.6",
"provider": "fal",
"operation": "reference-to-video",
"referenceVideoUrls": [
"https://example.com/reference.mp4"
],
"prompt": "@Video1 is walking through a beautiful garden",
"resolution": "1080p",
"aspectRatio": "16:9",
"duration": 5
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
Reference video URL
The example above uses https://example.com/reference.mp4 as a placeholder — replace with a real publicly fetchable video URL before submitting.
edit-video (v2.7 only)
Input video + prompt → transformed video. Preserves timing; rewrites content.
{
"engine": "wan",
"version": "v2.7",
"provider": "fal",
"operation": "edit-video",
"videoUrl": "https://example.com/input.mp4",
"prompt": "Transform the scene into a cyberpunk aesthetic with neon lighting",
"resolution": "1080p",
"audioSetting": "auto"
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
Source video URL
Replace https://example.com/input.mp4 with a real publicly fetchable video URL before submitting.
Common parameters
These appear on most (version, operation) combinations; the schema for your chosen variant is the source of truth.
| Field | Typical values | Notes |
|---|---|---|
resolution | 720p, 1080p | 1080p costs more and takes longer. |
aspectRatio | 16:9, 9:16, 1:1 | Vertical for reels/shorts. |
duration | 5, 10 (seconds) | Longer clips push you past the 100 s wait cap — use webhooks. |
enablePromptExpansion | true | false | Let the model expand short prompts. Disable for reproducibility. |
enableSafetyChecker | true (default) | Disable only if you handle moderation yourself. |
audioUrl / audioSetting | URL or auto | Attach background audio (2.6+) or drive audio inference (2.7 edit). |
Provider-specific features
FAL (all versions)
Hosted inference with low queue time. FAL is the production default. enablePromptExpansion and audio attachment only exist on FAL variants.
Comfy (v2.2 only)
Runs on Civitai's ComfyUI workers. Two features aren't available on FAL:
- LoRAs via the
lorasarray with AIR identifiersjson"loras": [{ "air": "urn:air:lora:civitai:123456@789012", "strength": 0.8 }] - Turbo mode (
useTurbo: true) + frame-interpolator models (interpolatorModel: "film") for faster runs at lower quality - Multi-step workflows — chain
videoGen→videoInterpolation→videoUpscalerin onestepsarray
Civitai (v2.1 only)
Legacy self-hosted path. Accepts explicit model AIRs and width/height instead of resolution/aspectRatio. Migrate to FAL 2.6+ unless you have a specific reason.
Reading the result
On success each videoGen step emits a single video blob:
{
"id": "wf_01HXYZ...",
"status": "succeeded",
"steps": [{
"name": "0",
"$type": "videoGen",
"status": "succeeded",
"output": {
"video": { "id": "blob_...", "url": "https://.../signed.mp4" }
}
}]
}Blob URLs are signed and expire — refetch the workflow or call GetBlob for a fresh URL; don't cache them long-term. Download and store the bytes yourself if you need durable storage.
Long-running jobs
WAN jobs routinely run longer than 100 s (any 1080p clip ≥ 10 s; reference-to-video; edit-video). The 100-second request timeout means wait is capped — use wait=90 for a best-effort inline attempt, then fall back to:
- Webhooks (preferred): register a callback with
type: ["workflow:succeeded", "workflow:failed"]— see Results & webhooks - Polling:
GET /v2/consumer/workflows/{workflowId}on a 5 s → 10 s → 30 s cadence
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
400 with unknown field | Field isn't valid for this (version, provider, operation) combo | Check the specific Wan<X><Provider><Op>Input schema via SubmitWorkflow. |
Step failed, error.code = "no_provider" | No capacity for that resolution/duration on the chosen provider | Retry, drop to 720p, or switch provider. |
workflow:processing after wait=90 returns | Job ran past the 100 s timeout | Expected — continue via webhook or poll. |
Blob URL 403 after a few minutes | Signed URL expired | Refetch the workflow to get a fresh URL. |
| Reference prompt ignored | @VideoN tokens missing or misnumbered | Tokens are 1-indexed and must match items in referenceVideoUrls. |
Cost
Billed in Buzz on the workflow's transactions. Use whatif=true for an exact preview; see Payments (Buzz) for currency selection.
WAN video pricing varies by version, provider, resolution, and acceleration flags. All the numbers below are per single video (not per second per clip unless noted).
v2.7 (FAL)
Flat per-second across resolutions:
total = 130 × duration_seconds- 5 s → ~650 Buzz
- 10 s → ~1 300 Buzz
v2.6 (FAL)
Resolution-scaled per-second:
| Resolution | Buzz per second |
|---|---|
720p | 130 |
1080p | 195 |
- 720p × 5 s → ~650 Buzz
- 1080p × 5 s → ~975 Buzz
v2.5 (FAL)
Resolution-scaled per-second:
total = 100 × resolutionFactor × durationwith resolutionFactor = 1 (480p) / 2 (720p) / 3 (1080p).
- 720p × 5 s → 1 000 Buzz
- 720p × 10 s → 2 000 Buzz
- 480p × 5 s → 500 Buzz
v2.2 (FAL, text-to-video / image-to-video)
Driven by Turbo / LoRA / standard, plus resolution:
| Mode | 720p | 580p | 480p |
|---|---|---|---|
| Turbo | ~130 / video (flat) | ~97.5 | ~65 |
| With LoRA | ~94.9 × duration | ~94.9 × duration | ~94.9 × duration |
| Standard | ~104 × duration | ~78 × duration | ~52 × duration |
Typical: 720p Turbo 5 s → ~130; 720p standard 5 s → ~520; 720p LoRA 5 s → ~475.
v2.2-5b (FAL)
| Mode | Buzz per video |
|---|---|
Fast-wan 720p | ~32.5 |
| Distill | ~75.9 |
| Standard | ~142.35 |
| Image-to-video | ~142.35 (flat) |
v2.2 (Comfy, Civitai-hosted)
Variable per-pixel-per-step formula with an 8× markup, minimum 100 Buzz, rounded up to the nearest 25:
areaCost = max(a × width × height + b, 0) // per-frame per-step compute factor
duration = length × steps × areaCost
buzz = max(100, ceil((duration × 420/3600 × 8) / 25) × 25)Where (a, b) is (1.22e-6, -0.14) for image-to-video, (2.53e-7, -0.0259) for text-to-video. This path is noticeably more expensive per second than the FAL routes — FAL is the default for a reason.
v2.1 (legacy)
total = 100 × resolutionFactor × durationwith resolutionFactor = 1 (480p) / 2 (720p) / 3 (1080p). 720p × 5 s → ~1 000 Buzz.
Quick reference
For new integrations on v2.6 / v2.7 at 720p × 5 s with no LoRAs, expect ~650–975 Buzz per video. Always whatif=true before long-duration / high-res submissions — costs scale linearly with duration and can escalate fast.
Related
SubmitWorkflow— operation used by every example hereGetWorkflow— for polling- Results & webhooks — production-ready result handling
- Full parameter catalog: the
Wan<version><Provider><Operation>Inputschemas in the API reference (e.g.Wan26FalTextToVideoInput,Wan27FalEditVideoInput) videoGenendpoint OpenAPI spec — standalone OpenAPI 3.1 YAML covering the fullvideoGensurface (WAN, LTX2, Flux, etc.); import into Postman / OpenAPI Generator