Skip to content

Wan video LoRA training

Preview ecosystem

Wan video training is currently marked Preview in the orchestrator. The endpoint accepts requests and whatif=true cost previews work, but actual training runs may not be available on every worker fleet. Reach out via Civitai Discord before integrating against production traffic.

Train a WAN video LoRA on a small set of source video clips using AI Toolkit. Output is a video LoRA usable in WAN text-to-video and image-to-video generation.

modelVariantWan familyDefault price
2.1Wan 2.1 (14B)3000 Buzz
2.2Wan 2.2 (14B-A14B)3000 Buzz

Long-running step

Video training is the slowest training mode on the platform — a 2000-step run on a 4-clip dataset takes many minutes. Always use wait=0 and follow up via webhook or polling.

The request shape

json
{
  "$type": "training",
  "input": {
    "engine":       "ai-toolkit",
    "ecosystem":    "wan",
    "modelVariant": "2.1"        // 2.1 | 2.2
  }
}

Prerequisites

  • A Civitai orchestration token (Quick start → Prerequisites)
  • A training-data zip containing source video clips (each ≤ a few seconds, similar resolution)
  • An accurate count of clips in the zip

Wan 2.1 / 2.2

Both variants share the same input shape and per-step cost; pick the one that matches your inference target. The example below uses 2.1; swap modelVariant to "2.2" for Wan 2.2 training (no other change required).

http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training", "video"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "wan",
      "modelVariant": "2.1",
      "steps": 2000,
      "resolution": 512,
      "lr": 0.0002,
      "trainTextEncoder": false,
      "lrScheduler": "constant",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/5418/2202966TrainingData.Kjwp.zip",
        "count": 4
      },
      "samples": {
        "prompts": ["a video of TOK", "TOK moving in a garden"]
      }
    }
  }]
}
POST/v2/consumer/workflows
Set your Civitai API token via the Token button in the navbar to enable Try It.
Request body — edit to customize (e.g. swap the image URL or prompt)
Valid JSON

Common parameters

Defaults shown are the post-ApplyDefaults values for Wan.

FieldRequiredDefaultNotes
engineAlways ai-toolkit.
ecosystemAlways wan for this page.
modelVariant2.1 or 2.2.
steps2000110000. Total training steps. Primary driver of training length.
epochs10120. Number of saved checkpoints delivered, each separately downloadable. Each adds 200 Buzz — Wan's per-epoch preview samples are videos and expensive to compute, so keep the epoch count modest.
batchSize1Fixed at 1 for this ecosystem.
continueFrom(none)A previously-trained urn:air:wan:lora:... AIR to resume from (see Continue training). Must be a Wan LoRA.
lr0.00010.0002 is a typical override for video; see example.
trainTextEncoderfalseLeave off — Wan training does not benefit from text-encoder updates.
lrSchedulercosineconstant, constant_with_warmup, cosine, linear, step.
optimizerTypeadamw8bitSee SDXL/SD1 page for full enum.
networkDim321256.
networkAlphamatches networkDim1256.
noiseOffset001.
flipAugmentationfalseRandom horizontal flips.
shuffleTokens / keepTokensfalse / 0Caption-tag shuffling.
triggerWord(none)Activation token. Per the source, not all video ecosystems support triggerWord — leave empty if you see schema rejections.
trainingData.{type, sourceUrl, count}type: "zip". Zip should contain video clips.
samples.prompts[][]Preview videos rendered at each saved checkpoint with the trained LoRA.
samples.negativePrompt(none)
samples.cfgScale(ecosystem default)Overrides the CFG / guidance scale used when rendering the preview samples.
samples.strength1.0Trained-LoRA weight applied in the preview samples.

Continue training / train further

To resume from a Wan LoRA you already trained instead of starting from the base checkpoint, set continueFrom to that LoRA's AIR. The new run starts from those weights and the new epochs build on top:

json
{
  "$type": "training",
  "input": {
    "engine": "ai-toolkit",
    "ecosystem": "wan",
    "modelVariant": "2.1",
    "continueFrom": "urn:air:wan:lora:civitai:<id>@<version>",
    "steps": 1000
  }
}

continueFrom must point at a LoRA of the same ecosystem (a Wan LoRA) as the model being trained — a mismatched ecosystem is rejected.

Reading the result

Same envelope as the other training recipes — see SDXL/SD1 → Reading the result. Each saved checkpoint yields a video LoRA .safetensors blob plus any sample .mp4 files. The trained LoRA is usable in WAN video generation by referencing it in the loras field.

Runtime

Wall time, default settings on a 4-clip dataset:

VariantPer 100 stepsTypical full run
2.1~1–3 min20–60 min for 2000 steps
2.2~1–3 min20–60 min for 2000 steps

Always use wait=0.

Cost

Training is billed per step plus a flat per-epoch storage surcharge, with a price floor:

price = steps × costPerStep + epochs × 200       (rounded)
costPerStep = 0.5   (2.1 and 2.2)
floor: never less than 80% of the default-configuration price

Wan's per-epoch surcharge is 200 Buzz (not 10 like image ecosystems) because each epoch's preview samples are videos and expensive to compute — so most of Wan's cost is in the epoch count, not the step count. The default run is 2000 steps / 10 epochs2000 × 0.5 + 10 × 200 = 1000 + 2000 = 3000 Buzz. The floor is 80% of the default price (2400 Buzz).

Lowering epochs saves the most. Sample-prompt rendering itself uses Wan video-generation rates and is billed separately. Run with whatif=true to see the exact pre-flight charge.

ConfigurationBuzz (training only)
default (steps: 2000, epochs: 10)3000 + samples
steps: 2000, epochs: 205000 + samples (each extra checkpoint adds 200)
steps: 1000, epochs: 52400 + samples (floor)

Troubleshooting

SymptomLikely causeFix
400 with "modelVariant required"Missing modelVariantSet to "2.1" or "2.2".
Step starts then fails immediatelyPreview ecosystem not yet enabled on the routing GPU fleetContact Civitai support — Wan training is rolling out.
Step failed with VRAM-related errorResolution × clip length too high for the workerLower resolution (e.g. to 512), shorten clips to ≤ 3 seconds.
Trained LoRA produces static / no motionToo few steps, too few / too short clipsRaise steps; ensure clips show the motion you want learned.
Step failed, moderationStatus: "Rejected"Dataset failed content moderationReplace flagged clips.

Civitai Developer Documentation