Skip to content

Wan video LoRA training

Preview ecosystem

Wan video training is currently marked Preview in the orchestrator. The endpoint accepts requests and whatif=true cost previews work, but actual training runs may not be available on every worker fleet. Reach out via Civitai Discord before integrating against production traffic.

Train a WAN video LoRA on a small set of source video clips using AI Toolkit. Output is a video LoRA usable in WAN text-to-video and image-to-video generation.

modelVariantWan familyBuzz / epoch
2.1Wan 2.1 (14B)12
2.2Wan 2.2 (14B-A14B)12

Long-running step

Video training is the slowest training mode on the platform — single-digit minutes per epoch on a 4-clip dataset. Always use wait=0 and follow up via webhook or polling.

The request shape

json
{
  "$type": "training",
  "input": {
    "engine":       "ai-toolkit",
    "ecosystem":    "wan",
    "modelVariant": "2.1"        // 2.1 | 2.2
  }
}

Prerequisites

  • A Civitai orchestration token (Quick start → Prerequisites)
  • A training-data zip containing source video clips (each ≤ a few seconds, similar resolution)
  • An accurate count of clips in the zip

Wan 2.1 / 2.2

Both variants share the same input shape and per-epoch cost; pick the one that matches your inference target. The example below uses 2.1; swap modelVariant to "2.2" for Wan 2.2 training (no other change required).

http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training", "video"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "wan",
      "modelVariant": "2.1",
      "epochs": 2,
      "resolution": 512,
      "lr": 0.0002,
      "trainTextEncoder": false,
      "lrScheduler": "constant",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/5418/2202966TrainingData.Kjwp.zip",
        "count": 4
      },
      "samples": {
        "prompts": ["a video of TOK", "TOK moving in a garden"]
      }
    }
  }]
}
POST/v2/consumer/workflows
Set your Civitai API token via the Token button in the navbar to enable Try It.
Request body — edit to customize (e.g. swap the image URL or prompt)
Valid JSON

Common parameters

Defaults shown are the post-ApplyDefaults values for Wan.

FieldRequiredDefaultNotes
engineAlways ai-toolkit.
ecosystemAlways wan for this page.
modelVariant2.1 or 2.2.
epochs5120. Billed per epoch. Keep low (2–5) for video — the per-epoch step count is much higher than image.
numberOfRepeats(no auto-default for Wan)15000.
lr0.00010.0002 is a typical override for video; see example.
trainTextEncoderfalseLeave off — Wan training does not benefit from text-encoder updates.
lrSchedulercosineconstant, constant_with_warmup, cosine, linear, step.
optimizerTypeadamw8bitSee SDXL/SD1 page for full enum.
networkDim321256.
networkAlphamatches networkDim1256.
noiseOffset001.
flipAugmentationfalseRandom horizontal flips.
shuffleTokens / keepTokensfalse / 0Caption-tag shuffling.
triggerWord(none)Activation token. Per the source, not all video ecosystems support triggerWord — leave empty if you see schema rejections.
trainingData.{type, sourceUrl, count}type: "zip". Zip should contain video clips.
samples.prompts[][]Per-epoch preview videos rendered with the trained LoRA.
samples.negativePrompt(none)

Reading the result

Same envelope as the other training recipes — see SDXL/SD1 → Reading the result. Each epoch yields a video LoRA .safetensors blob plus any sample .mp4 files. The trained LoRA is usable in WAN video generation by referencing it in the loras field.

Runtime

Per-epoch wall time, default settings on a 4-clip dataset:

VariantPer-epochTypical full run
2.1~3–10 min6–20 min for 2 epochs
2.2~3–10 min6–20 min for 2 epochs

Always use wait=0.

Cost

total = 12 × epochs   (Buzz, base cost)

Cost-per-epoch is 12 per the orchestrator source. Sample-prompt rendering uses Wan video-generation rates (much higher than image samples) and is billed separately. Run with whatif=true to see the exact pre-flight charge.

ConfigurationBuzz (training only)
epochs: 224 + samples
epochs: 560 + samples

Troubleshooting

SymptomLikely causeFix
400 with "modelVariant required"Missing modelVariantSet to "2.1" or "2.2".
Step starts then fails immediatelyPreview ecosystem not yet enabled on the routing GPU fleetContact Civitai support — Wan training is rolling out.
Step failed with VRAM-related errorResolution × clip length too high for the workerLower resolution (e.g. to 512), shorten clips to ≤ 3 seconds.
Trained LoRA produces static / no motionToo few epochs, too few / too short clipsRaise epochs to 3–5; ensure clips show the motion you want learned.
Step failed, moderationStatus: "Rejected"Dataset failed content moderationReplace flagged clips.

Civitai Developer Documentation