---
url: /orchestration/recipes/ace-step-audio.md
---

# ACE-Step music generation

[ACE-Step 1.5](https://github.com/ace-step/ACE-Step) is an open text-to-music model that produces full songs from a style description plus structured lyrics. The orchestrator exposes it through a single `aceStepAudio` step, which runs on Civitai's ComfyUI workers. The default checkpoint is the 2B turbo model (`ace_step_1.5_turbo_aio.safetensors`) — an eight-step distillation that generates a 30-second song in ~10 s of worker time.

Without a cover image the step emits an MP3 audio blob. Attach `cover.imageUrl` and the output is an MP4 video with that image as the still background, sized 512×512.

## Variants

There's one step type and one invocation path; the only variant axis is the optional `diffusionModel` override, which swaps the underlying diffusion checkpoint.

All values come from Comfy-Org's [`ace_step_1.5_ComfyUI_files`](https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files) HuggingFace bundle. The default (unset) is the 2B turbo all-in-one checkpoint.

| `diffusionModel` | Variant | Params | `steps` | `cfg` | Best for |
|---|---|---|---|---|---|
| *(unset)* | `urn:air:ace:checkpoint:huggingface:Comfy-Org/ace_step_1.5_ComfyUI_files@main/checkpoints/ace_step_1.5_turbo_aio.safetensors` | 2B turbo (AIO) | `8` | `1.0` | **Default** — single all-in-one file; fastest path. |
| 2B turbo | `urn:air:ace:checkpoint:civitai:2549270@2864880` | 2B | `8` | `1.0` | Split-file equivalent of the default AIO. Prefer the AIO unless you're already pulling split files. |
| 2B base | `urn:air:ace:checkpoint:civitai:2549270@2864864` | 2B | `50` | `~4` | Non-turbo 2B base — higher fidelity than turbo at the cost of sampling time. |
| XL turbo | `urn:air:ace:checkpoint:civitai:2549270@2864949` | 4B | `8` | `1.0` | More fidelity at turbo speed. Higher VRAM; slower first-submission while the worker pulls the split files. |
| XL base | `urn:air:ace:checkpoint:civitai:2549270@2864892` | 4B | `50` | `~4` | Highest-fidelity base 4B. Non-turbo; typically slowest. |
| XL SFT | `urn:air:ace:checkpoint:civitai:2549270@2864917` | 4B | `50` | `~4` | Supervised-fine-tuned 4B; sibling of XL base with the same runtime characteristics. |

Turbo variants are distilled to converge in 8 steps with CFG effectively off (`1.0`). Non-turbo base / SFT variants expect the full 50-step schedule with classifier-free guidance on (around `4`) — submitting them with the default `steps: 8` / `cfg: 1.0` produces underbaked output.

**Default choice for new integrations**: omit `diffusionModel` entirely. The 2B turbo AIO file is the default and is what Civitai's workers are consistently warm on. Reach for an XL split-file override only when the default fidelity isn't enough and you can tolerate a slow first-submission while the worker pulls the additional files.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A `musicDescription` — a short, genre-prefixed style blurb (e.g. `"Neo-Soul: warm Rhodes, brush kit, introspective"`)
* A `lyrics` string — structured with section markers (`[Verse]`, `[Chorus]`, `[Bridge]`, …). Use `""` for pure instrumentals (and set `vocalWeight: 0.0` / `instrumentalWeight: 1.0`)
* A `seed` — any integer; same seed + same input reproduces the track deterministically

## Default (2B turbo, audio-only)

The default path — no `diffusionModel` override, no cover. Output is an MP3 blob.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "aceStepAudio",
    "input": {
      "musicDescription": "Neo-Soul: A warm, organic neo-soul track with smooth Rhodes chords, mellow bass, and gentle drums. Soulful and introspective mood.",
      "lyrics": "[Verse 1]\nSunlight breaks through the morning haze\nCoffee steam rising, starting the day\n\n[Chorus]\nThis is the rhythm of my life\nSimple moments, pure delight",
      "duration": 30,
      "bpm": 95,
      "key": "D major",
      "language": "en",
      "seed": 12345
    }
  }]
}
```

## Instrumental (no vocals)

Drop vocals by pairing an empty `lyrics` string with `vocalWeight: 0.0` and `instrumentalWeight: 1.0`. The model still needs both fields — an empty `lyrics` with the default `vocalWeight` of 0.9 will produce scat-like placeholder vocals.

## Audio with cover image (MP4 output)

Attach `cover.imageUrl` and the step emits a `video` blob (`.mp4`) with the image as a static 512×512 background instead of an MP3.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "aceStepAudio",
    "input": {
      "musicDescription": "Rock: A driving rock track with powerful guitars and thundering drums.",
      "lyrics": "[Intro]\n[Verse]\nBreaking through the walls tonight\nNothing is gonna stop this fight",
      "duration": 30,
      "bpm": 140,
      "key": "E minor",
      "seed": 42,
      "cover": {
        "imageUrl": "https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/07f78344-e165-4e96-8340-caf0e562f070/anim=false,width=450,optimized=true/1.jpeg"
      }
    }
  }]
}
```

`cover.imageUrl` accepts either a plain URL string or a workflow `$ref` pointing at an earlier step's output (e.g. chain an `imageGen` step to generate the album art, then feed it into `aceStepAudio` — see [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism)).

## Switching the diffusion model

Set `diffusionModel` to a full AIR URN. The 2B turbo AIO is the default; everything else is a drop-in override.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "aceStepAudio",
    "input": {
      "musicDescription": "Cinematic Orchestral: Sweeping strings, bold brass, and thundering percussion.",
      "lyrics": "",
      "duration": 30,
      "bpm": 110,
      "key": "D minor",
      "instrumentalWeight": 1.0,
      "vocalWeight": 0.0,
      "seed": 3,
      "diffusionModel": "urn:air:ace:checkpoint:civitai:2549270@2864949"
    }
  }]
}
```

The split-file XL checkpoints require the worker to download them on first use, so a fresh submission can sit in `scheduled` for a minute or two before a worker is warm. Use the `wait=60` resume loop (see [Runtime](#runtime)) or webhooks — don't wait on a single `wait=60` POST for the first XL call.

## Parameters

| Field | Required | Default | Notes |
|---|---|---|---|
| `musicDescription` | ✅ | — | Style / genre description. Prefix with a genre label (`"Neo-Soul:"`, `"Jazz:"`) for best results. |
| `lyrics` | ✅ | — | Structured lyrics with `[Verse]`, `[Chorus]`, `[Bridge]` markers. Use `""` for pure instrumentals. |
| `seed` | ✅ | — | Any `int32`. Same inputs + same seed reproduce the track. |
| `duration` | | `60` | Seconds, range `1`–`190`. Longer durations increase Buzz linearly — see [Cost](#cost). |
| `bpm` | | `120` | Beats per minute, range `40`–`200`. |
| `timeSignature` | | `"4"` | Beats per measure. `"3"` / `"4"` / `"6"` common. |
| `language` | | `"en"` | Language code — `en`, `zh`, `ja`, `ko`, … |
| `key` | | `"C major"` | Musical key, e.g. `"E minor"`, `"Bb major"`. |
| `instrumentalWeight` | | `0.85` | Range `0.0`–`1.0`. Raise toward `1.0` for instrumental-heavy mixes. |
| `vocalWeight` | | `0.9` | Range `0.0`–`1.0`. Set to `0.0` when `lyrics` is empty or you want a pure instrumental. |
| `diffusionModel` | | *(2B turbo AIO)* | Full AIR URN for the diffusion checkpoint. See the [Variants](#variants) table. |
| `cover.imageUrl` | | *(none)* | URL (or workflow `$ref`) to a cover image. When set, output is an MP4 video with the image as the 512×512 background instead of an MP3. |

## Reading the result

Audio-only runs emit a single `audio` blob (MP3):

```json
{
  "status": "succeeded",
  "cost": { "total": 4 },
  "steps": [{
    "name": "$0",
    "$type": "aceStepAudio",
    "status": "succeeded",
    "output": {
      "blob": {
        "type": "audio",
        "id": "blob_....mp3",
        "available": true,
        "url": "https://orchestration-new.civitai.com/v2/consumer/blobs/blob_....mp3?sig=...&exp=...",
        "urlExpiresAt": "2027-04-14T15:13:40Z",
        "duration": 30,
        "jobId": "..."
      }
    },
    "jobs": [{
      "id": "...",
      "status": "succeeded",
      "startedAt": "2026-04-14T15:13:28.512Z",
      "completedAt": "2026-04-14T15:13:37.319Z",
      "cost": 4
    }]
  }]
}
```

Fields:

* **`blob.type`** — `"audio"` for MP3 output (no cover), `"video"` when `cover.imageUrl` was supplied (MP4 output).
* **`blob.id`** — stable blob key, ending in `.mp3` or `.mp4`.
* **`blob.url`** — signed URL. Fetch within `urlExpiresAt` or refetch the workflow / call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.
* **`blob.duration`** — on audio blobs only, the requested duration in seconds (echoes `input.duration`). Video blobs omit this and expose `width` / `height` (both 512) instead.
* **`blob.available`** — `true` once the file is persisted. Whatif previews return `false` because no job actually ran.

When `cover.imageUrl` is set, `blob` is a video blob — same shape, `type: "video"`, `.mp4` extension, `width: 512`, `height: 512`. Despite the C# source commenting "WebM", the current Civitai pipeline emits MP4.

## Runtime

Measured end-to-end against `orchestration.civitai.com` on 2026-04-14:

| Shape | POST → terminal |
|---|---|
| `duration: 30`, 2B turbo, no cover | ~15 s (job itself ~9 s) |
| `duration: 60`, 2B turbo, no cover | ~15 s (job itself ~14 s) |
| `duration: 30`, 2B turbo, with cover image | ~13 s (job itself ~7 s) |
| `duration: 30`, XL turbo (4B) cold worker | >60 s (needs `wait=60` resume loop; worker had to pull split files) |

The 2B turbo default beats the 60-s long-poll window comfortably for every duration up to the 190-s cap, so **submit with `wait=60` and expect the POST itself to return terminal state**. If it doesn't (cold XL variant, capacity pressure), the response comes back non-terminal at the 60-s ceiling — re-issue `GET /v2/consumer/workflows/{id}?wait=60` in a loop until the response is terminal. See [Results & webhooks](/orchestration/guide/results-and-webhooks) for the resume pattern.

For backend integrations that can't hold a connection, register a webhook URL and submit with `wait=0` (fire-and-forget).

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Cost is driven purely by `duration` — a flat base charge plus a per-second factor. Nothing else in the input affects price (model variant, BPM, cover image, instrumental weights, lyrics length are all free).

```
total = 1 + duration × 0.1
```

| Shape | Buzz |
|---|---|
| `duration: 10` (shortest useful clip) | 2 |
| `duration: 30` (default recipe example) | **4** |
| `duration: 60` (schema default) | 7 |
| `duration: 90` (typical full song) | 10 |
| `duration: 180` (near max, 3-minute track) | 19 |

Arithmetic check against the formula: `1 + 30 × 0.1 = 4` ✅, `1 + 60 × 0.1 = 7` ✅, `1 + 180 × 0.1 = 19` ✅. Prod whatif previews confirmed these exact Buzz figures on 2026-04-14. The orchestrator surfaces the raw `Factors["total"]` value — non-integer formula outputs (e.g. `duration: 15` → `2.5`) are passed through unchanged in `cost.total`; there's no `Math.Ceiling` / `Math.Round` in the handler.

Cover images, key, BPM, time signature, language, and instrumental / vocal weights don't affect Buzz price — ACE-Step bills flat-plus-per-second on duration only.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `400` with `"duration must be between 1 and 190"` (or similar range complaint) | `duration` outside `[1, 190]`, `bpm` outside `[40, 200]`, or a weight outside `[0.0, 1.0]` | Clamp the field to the range in the parameters table. |
| `400` with `"musicDescription is required"` / `"lyrics is required"` / `"seed is required"` | Missing one of the three required fields. `lyrics: ""` is valid; the field itself must still be present. | Include every required field explicitly. |
| `400` with `"Unable to analyze … file"` on the cover image | `cover.imageUrl` pointed at a host that rejected the orchestrator's fetch (range requests, UA block, ALB cookie gating) | Use a Civitai CDN URL, or generate the cover with an `imageGen` step and `$ref` its output. |
| Output has scat-like placeholder vocals on an "instrumental" track | `lyrics: ""` but `vocalWeight` left at default `0.9` | Set `vocalWeight: 0.0` (and ideally `instrumentalWeight: 1.0`) whenever `lyrics` is empty. |
| Step `failed`, `reason = "blocked"` | Content moderation on the description / lyrics / cover image | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |
| Workflow stuck in `scheduled` for >60 s on an XL `diffusionModel` override | No warm worker has the split-file checkpoint yet; the first submission of a given XL variant triggers a download | Keep polling with `?wait=60`; subsequent submissions in the same hour land on the now-warm worker in ~15 s. |
| Request timed out (`wait=60` returned non-terminal) | Cold XL variant, capacity pressure, or `duration` near 190 s on a busy shard | Re-issue `GET /v2/consumer/workflows/{id}?wait=60` until the response is terminal. |

## Related

* [`InvokeAceStepAudioStepTemplate`](/orchestration/reference/operations/InvokeAceStepAudioStepTemplate) — the per-recipe endpoint
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining `aceStepAudio` into multi-step workflows
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for the `wait=60` resume loop
* [Transcription](./transcription) — inverse direction (audio → text); chain after `aceStepAudio` to auto-caption a track
* [Text-to-speech](./text-to-speech) — sibling audio recipe for spoken output
* [Flux 2 image generation](./flux2) — common upstream for generating cover art to feed into `cover.imageUrl`
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — for chaining an `imageGen` cover generator into this step
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running submissions (cold XL variants, webhooks)
* Full parameter catalog: the `AceStepAudioInput` schema in the [API reference](/orchestration/reference/)
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/aceStepAudio/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint

---

---
url: /site/guide/air.md
description: >-
  The AI Resource Identifier (AIR) URN format used across Civitai and the
  Orchestration API.
---

# AIR identifiers

An **AI Resource Identifier** (AIR) is the canonical URN-style string Civitai
uses to reference any AI resource — a checkpoint, LoRA, VAE, embedding, or
upscaler — consistently across the site API, the Orchestration API, and
partner integrations.

Every response from [`GET /model-versions/{id}`](../reference/model-versions#get-a-model-version)
includes an `air` field you can pass directly to generation APIs.

## Format

```
urn:air:{ecosystem}:{type}:{source}:{id}[@{version}][+{fileId}][.{format}]
```

The `urn:` and `air:` prefixes are both optional — parsers accept
`urn:air:sdxl:checkpoint:civitai:827184@2514310`,
`air:sdxl:checkpoint:civitai:827184@2514310`, and bare
`sdxl:checkpoint:civitai:827184@2514310` interchangeably. **Use the full
`urn:air:...` form** in API requests; it's the unambiguous canonical form.

### Fields

| Field | Required | Description |
|-------|----------|-------------|
| `ecosystem` | Optional | Model family bucket: `sd15`, `sdxl`, `sd3`, `flux1`, `other`, etc. |
| `type` | Optional | Resource kind: `checkpoint`, `lora`, `embedding`, `vae`, `controlnet`, `upscaler`. |
| `source` | Required | Hosting system: `civitai`, `civitai-r2`, `huggingface`, `orchestrator`. |
| `id` | Required | Resource identifier within the source. For `civitai`, this is the **model ID**. |
| `version` | Optional | Specific version (for `civitai` this is the model version ID). If omitted, the resource's default/latest version is implied. |
| `fileId` | Optional | Specific `ModelFile` id, prefixed with `+`. Disambiguates between multiple files attached to the same version (e.g. a pruned vs. full-weight checkpoint, or a base model shipped alongside its text-encoder file). Omit to let the resolver pick the primary file. |
| `format` | Optional | Model file format, e.g. `safetensor`, `ckpt`, `diffuser`. |

## Real examples

From actual `GET /api/v1/model-versions/{id}` responses and internal workflow
templates:

```
urn:air:sdxl:checkpoint:civitai:827184@2514310
urn:air:sdxl:checkpoint:civitai:827184@2514310+2402203
urn:air:illustrious:checkpoint:civitai:795765@900661
urn:air:other:upscaler:civitai:147759@164821
urn:air:other:other:civitai-r2:civitai-worker-assets@sam_vit_b_01ec64.pth
```

The second example pins the AIR to a specific file on the version (e.g.
`waiIllustriousSDXL_v160.safetensors`, file id `2402203`) — useful when a
version ships multiple downloadable artifacts and you need to be explicit
about which one to load. The last one is a file asset (SAM ViT-B checkpoint)
stored on Civitai's R2 bucket rather than a model version.

## Type values

The `type` segment maps to Civitai's `ModelType` enum:

| AIR type | Civitai `ModelType` |
|----------|---------------------|
| `checkpoint` | `Checkpoint` |
| `lora` | `LORA` |
| `embedding` | `TextualInversion` |
| `vae` | `VAE` |
| `controlnet` | `Controlnet` |
| `upscaler` | `Upscaler` |

Resources that don't map to one of those (motion modules, detection models,
wildcards, etc.) use `other` as the type.

## Using AIR with the Orchestration API

The Orchestration API accepts AIR strings anywhere a resource is referenced.
Given a `modelVersionId` from the site API, the simplest way to get a valid
AIR is to call `GET /api/v1/model-versions/{id}` and forward the `air` field.

For example, to use `WAI-illustrious-SDXL v16.0` in a text-to-image workflow:

1. `curl https://civitai.com/api/v1/model-versions/2514310` →
   `"air": "urn:air:sdxl:checkpoint:civitai:827184@2514310"`
2. Pass that string as the checkpoint reference in your
   [Orchestration submission](/orchestration/guide/submitting-work).

## Building an AIR by hand

You can also construct an AIR directly from a Civitai model version:

```
urn:air:{baseModel}:{type}:civitai:{modelId}@{versionId}[+{fileId}]
```

Where `baseModel` comes from the model version's `baseModel` field
(`SDXL 1.0` → `sdxl`, `SD 1.5` → `sd15`, etc.) and `type` maps from the
parent model's `type` field as shown in the table above. Append
`+{fileId}` (using a `ModelFile.id` from `files[]` on the model version
response) only when you need to pin a specific file; otherwise the resolver
picks the primary file.

The site-generated `air` field already handles this mapping — prefer it over
hand-construction when you have the option.

---

---
url: /orchestration/recipes/anima.md
---

# Anima image generation

Anima is an anime-focused image generation ecosystem on Civitai's sdcpp workers. Single engine path, one operation (`createImage` — no img2img or edit support), optimized defaults for anime/illustration output:

* `engine: "sdcpp"`, `ecosystem: "anima"`
* **Only `createImage`** — Anima doesn't expose `createVariant` or `editImage`. Use [Flux 2 Klein](./flux2#klein-createvariant-img2img) or [Qwen](./qwen) if you need img2img or prompt-driven editing.
* Higher default `steps` (`30`) and lower default `cfgScale` (`4`) than the SD ecosystems — tuned for anime output
* Supports LoRAs for style/character injection
* No checkpoint URN needed — the ecosystem ships its own model; an optional `diffuserModel` override exists for advanced cases

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* No checkpoint URN required — Anima uses a built-in diffuser

## Text-to-image

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "anima",
      "operation": "createImage",
      "prompt": "masterpiece, best quality, 1girl, solo, portrait, looking at viewer, cinematic lighting",
      "negativePrompt": "worst quality, low quality, blurry, bad anatomy, deformed hands",
      "width": 1024,
      "height": 1024,
      "cfgScale": 4,
      "steps": 30
    }
  }]
}
```

### Parameters

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `prompt` | — ✅ | ≤ 10 000 chars | Booru-style tags work best. Lead with quality boosters (`masterpiece, best quality, …`). |
| `negativePrompt` | *(none)* | ≤ 10 000 chars | Recommended. `worst quality, low quality, blurry, bad anatomy, deformed hands` is a solid starting point. |
| `width` / `height` | `1024` | `64`–`2048`, divisible by 16 | Anima is trained around 1024² and well-behaved aspect ratios near that pixel count. |
| `cfgScale` | `4` | `0`–`30` | **Lower than SD1/SDXL's 7.** `3`–`5` is the sweet spot for Anima. |
| `steps` | `30` | `1`–`150` | **Higher than most sdcpp defaults.** `25`–`35` typical. |
| `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | `simple` | enum | [`SdCppSchedule`](/orchestration/reference/). |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; `0.6`–`1.0` strengths typical. |
| `diffuserModel` | *(built-in)* | AIR URN | Optional override for the diffuser. The default built-in model is what you want in almost every case. |
| `quantity` | `1` | `1`–`12` | Number of images per call. |
| `seed` | random | int64 | Pin for reproducibility. |

### Aspect-ratio variants

Anima handles non-square aspect ratios well near ~1 megapixel total area — similar guidance to SDXL. Well-behaved dimensions include 1024², 1152×896, 1344×768, 1536×640, and their mirrors.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "anima",
      "operation": "createImage",
      "prompt": "masterpiece, best quality, cyberpunk anime scene, neon city street at night",
      "negativePrompt": "worst quality, low quality, blurry",
      "width": 1344,
      "height": 768,
      "cfgScale": 4,
      "steps": 30
    }
  }]
}
```

### With LoRAs

Anima LoRAs are a map of AIR URN → strength. Style LoRAs usually sit at `0.6`–`1.0`; character / concept LoRAs often higher:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "anima",
      "operation": "createImage",
      "prompt": "masterpiece, best quality, detailed portrait of a magical girl in a forest",
      "negativePrompt": "worst quality, low quality",
      "width": 1024,
      "height": 1024,
      "cfgScale": 4,
      "steps": 30,
      "loras": {
        "urn:air:anima:lora:civitai:123456@789012": 0.8
      }
    }
  }]
}
```

Only Anima-tagged LoRAs work on the `anima` ecosystem.

## Reading the result

A successful `imageGen` step emits an `images[]` array — one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

Typical wall time per 1024×1024 image is 10–25 s. `wait=60` works comfortably for `quantity ≤ 2`. Higher `steps` counts and larger dimensions compound runtime; submit with `wait=0` and poll for large batches or atypical aspect ratios.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Per-pixel + per-step scaling against 1024² / 25 steps:

```
total = 8 × (width × height / 1024²) × (steps / 25) × quantity
```

| Shape | Buzz |
|-------|------|
| 1024²/`steps: 30`/`quantity: 1` (defaults) | **~9.6** |
| 1024²/`steps: 30`/`quantity: 4` | ~38 |
| 1344×768/`steps: 30` | ~7.5 × 1.2 ≈ **~9** |
| 1024²/`steps: 40` | ~12.8 |

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "operation must be createImage" | Passed `editImage` or `createVariant` | Anima only supports `createImage`. Use [Qwen](./qwen) or [Flux 2 Klein](./flux2#klein-createvariant-img2img) for img2img / edit on anime-style inputs. |
| `400` with "ecosystem must be anima" | Typo | Lowercase `"anima"`. |
| `400` with "model is not a valid property" | Sent `model` field | Anima has no checkpoint picker — delete the field, or if overriding, use `diffuserModel` instead. |
| Output looks flat or off-style | `cfgScale: 7` (SD default) on Anima | Drop to `cfgScale: 4`. Anima wants lower guidance than SD1/SDXL. |
| Output underbakes | `steps` too low for the prompt complexity | Bump to `steps: 30`–`40`. Anima's default is already `30` — don't go much below `20`. |
| LoRA has no effect | Wrong AIR URN, model private / not published, or ecosystem mismatch | Verify the URN on the LoRA's Civitai page; only Anima-tagged LoRAs work on the `anima` ecosystem. |
| Request timed out (`wait` expired) | Large `quantity`, atypical dimensions, or high `steps` | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Qwen image generation](./qwen) — alternative with edit + variant operations and LoRA support
* [SDXL image generation](./sdxl) — higher-fidelity general-purpose alternative
* [Flux 2](./flux2) / [Flux 1](./flux1) image generation — newer open-weights families
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `AnimaCreateImageGenInput` schema in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /orchestration/reference.md
---

# API Reference

Every consumer-facing operation, request schema, and response shape in the Civitai Orchestration API. Pages here are generated from the OpenAPI specification ([`v2-consumers.json`](https://orchestration.civitai.com/openapi/v2-consumers.json)) and stay in sync with the running API on every build.

## Conventions

* **Base URL**: `https://orchestration.civitai.com`
* **Auth**: `Authorization: Bearer <token>` on every request.
* **Content type**: `application/json` for bodies; blob upload endpoints accept `multipart/form-data` or presigned PUT.
* **IDs**: workflow IDs are ULIDs prefixed `wf_`; blob IDs are prefixed `blob_`.
* **Polymorphism**: workflow step bodies use a `$type` discriminator; request/response schemas list all valid subtypes under `oneOf`.

## Entry points

Most consumer integrations only touch three operations:

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — create a workflow with one or more steps
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — poll a single workflow
* [`QueryWorkflows`](/orchestration/reference/operations/QueryWorkflows) — list / filter workflows

The left sidebar is grouped by OpenAPI tag — **Workflows**, **WorkflowSteps**, **Recipes**, **Blobs**, **Resources**. Recipes have per-endpoint variants (one per job type) if you prefer the typed surface over the polymorphic `SubmitWorkflow` body.

## Rate limits & quotas

::: info Stub
Fill in once the per-tier rate limit scheme is finalized.
:::

---

---
url: /orchestration/guide/authentication.md
---

# Authentication

All consumer endpoints require `Authorization: Bearer <token>` on every request.

## Getting an API key

Manage your API keys from your Civitai account at **[civitai.com](https://civitai.com)** — generate new keys, revoke old ones, and copy tokens from there. Treat API keys like passwords: never commit them to source control, and rotate them if you suspect exposure.

## Using the token

```http
Authorization: Bearer <your-token>
```

All requests go to `https://orchestration.civitai.com`.

## Try It in the docs

Most pages on this site have a **Run** widget under each example. Click the **Token** button in the top-right of the navbar to paste your Bearer token; it's stored in your browser's `localStorage` and used for every Run / Reference Try-It on the site. The token never leaves your browser except in the `Authorization` header it sends to `orchestration.civitai.com`.

The widget supports:

* **Preview cost** — submits with `whatif=true`, shows a per-currency Buzz breakdown.
* **Submit for real** — runs the workflow with `wait=90`, then polls [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) until terminal.
* **Inline preview** — generated images and videos render in the page once the workflow finishes.

Reference operation pages have their own playground panel from the OpenAPI viewer (with its own auth field — paste once, persists across reloads).

::: info Stub
Expand once finalized: token scopes, rate limits per tier, rotation policy, how to request elevated access.
:::

---

---
url: /site/guide/authentication.md
description: How to authenticate with the Civitai site API using bearer tokens.
---

# Authentication

The Civitai site API uses **bearer tokens** generated from your account
settings. A single token covers every endpoint that accepts authentication.

::: info Building a third-party app?
Use [OAuth](/site/oauth/) instead of personal API keys — users authorize
your app explicitly with the scopes it needs and can revoke it any time,
without rotating anything on your side.
:::

## How to pass the token

Two methods are supported. The header form is strongly preferred; the
query-param form exists mainly for download-tool compatibility and leaks the
token into access logs and caches.

### Authorization header (preferred)

```bash
curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/me"
```

### Query parameter

```bash
curl "https://civitai.com/api/v1/me?token=$CIVITAI_TOKEN"
```

## Which endpoints require a token?

Endpoints fall into three categories:

| Category | Behavior without a token | Examples |
|---|---|---|
| **Public** | Full access. | `GET /creators`, `GET /tags`, `GET /images`, `GET /models/{id}`, `GET /model-versions/*` |
| **Mixed** | Accessible, but some filter params or fields may be unavailable. | `GET /models` (the `favorites` and `hidden` query params require auth) |
| **Authenticated** | `401 Unauthorized`. | `GET /me` |

Each page in the [Reference](../reference/) notes which category an endpoint falls into.

## What 401 looks like

Calling an authenticated endpoint without a token — or with an invalid one —
returns:

```
HTTP/2 401
Content-Type: application/json

{"error":"Unauthorized"}
```

Mixed endpoints silently degrade to anonymous access when no token is
provided; they only return 401 if you pass an auth-only filter (e.g.
`?favorites=true`) without a valid session.

## Caching and auth

Public endpoints set `Cache-Control: public, s-maxage=300, stale-while-revalidate=150` —
responses are cached for 5 minutes at the edge. When you call an endpoint *with*
a valid token, caching is skipped so personalized responses aren't shared.

CORS is open for public endpoints (`Access-Control-Allow-Origin: *`);
authenticated requests are restricted to Civitai-owned origins.

## Security tips

* Tokens are account-scoped. Rotating one means rotating everywhere it's used.
* If you suspect a leak, delete the key from your [account settings](https://civitai.com/user/account) and issue a new one.
* Prefer the `Authorization` header over `?token=`; query params end up in server logs, browser history, and proxy caches.
* Never embed a token in client-side code shipped to browsers or mobile apps.

---

---
url: /site/oauth/buzz-limits.md
description: >-
  How Civitai users cap an OAuth app's buzz spending, and what your app should
  expect at runtime.
---

# Buzz spend limits

OAuth tokens that include `AIServicesWrite` authorize your app to spend
the user's buzz on AI services (generation, training, scanning). To keep
that authorization sane, the consent flow lets users cap how much an app
can spend, and they can change the cap later from civitai.com.

Your app doesn't set or change the limit — the user does — but knowing
what they see and how it surfaces at runtime will save you a lot of debugging.

::: info Scope of the cap
Per-app buzz caps are enforced by the orchestrator, so they only apply to
**orchestrator-mediated spend** — every AI-services call your token makes.
Other buzz-spending scopes that an OAuth token can carry (notably
`BountiesWrite`, which lets the user create bounties) are gated by the
user's overall balance but are **not** subject to the per-app cap.
:::

## How users set a limit

When the user reaches the consent screen for a scope that includes
`AIServicesWrite`, Civitai shows a budget control alongside the scope list.
The current UI exposes a single "sliding window" budget — buzz limit + period
— but the underlying schema is more flexible.

After consent, users manage existing limits from **Account → Connected
Apps**. They can:

* Edit the limit per app.
* Remove the limit entirely (no cap).
* Revoke the app outright (which is a stronger action — invalidates all
  the app's tokens).

## Budget shape

Limits are stored as an array of budgets. Each budget is one of:

| Type | Fields | Meaning |
|---|---|---|
| `absolute` | `limit`, optional `currencies` | Hard cap. Once hit, no more spending on those currencies until the user resets. |
| `sliding` | `limit`, `unit`, `window`, optional `currencies` | Rolling window — e.g. `unit: 7, window: "day"` is "no more than `limit` in any 7-day stretch." This is what the simple UI ships. |
| `rollover` | `limit`, `cron`, optional `currencies` | Calendar-based reset on a cron expression (e.g. monthly reset on the 1st). |

`currencies` (when set) restricts the budget to specific buzz pools — leave
it off and the budget covers every buzz currency.

Your app **doesn't read** this structure directly — it's stored per-user
and enforced server-side. You'll only ever see its effect: spend calls
succeed or fail.

## What your app sees at runtime

When the orchestrator blocks a spend — for either "user is broke" **or**
"user's per-app cap is hit" — Civitai surfaces it the same way:

```json
{
  "code": "BAD_REQUEST",
  "message": "Hey buddy, seems like you don't have enough funds to perform this action."
}
```

(The `message` may be replaced by an orchestrator-provided detail string
when a per-app limit is what tripped the call — but the response **code is
the same** either way.)

There's no separate error code that lets you distinguish "out of buzz"
from "capped by the user". If you need to give a precise message to the
user, parse `message` defensively, or check the user's per-app spend
state via [`GET /api/v1/me`](../reference/users) ahead of the call and
present a likely-cause hint based on whether a limit is set.

::: warning Don't rely on message text for programmatic decisions
The exact default message string above comes from
[`throwInsufficientFundsError`](https://github.com/civitai/civitai)'s
helper and may change. Treat anything beyond the HTTP/RPC code as
human-readable only.
:::

## Best practices for buzz-spending clients

* **Surface the user's balance.** Call
  [`GET /api/v1/me`](../reference/users) periodically and show buzz in
  your UI — users hate guessing whether their next click will be denied.
* **Use `whatif=true` for cost preview**, not for limit detection. The
  orchestration `whatif` mechanism ([see the orchestration guide](../../orchestration/guide/submitting-work))
  is designed to give you a per-currency cost breakdown before you submit
  for real; treat it as a costing tool, not a "will this be denied?" oracle.
* **Don't retry on insufficient-funds errors.** Whether it's a real shortfall
  or the user's per-app cap, retrying won't help until balance or limits
  change. Show the user the error and let them resolve it.
* **Treat token revocation as expected.** A user who hits their cap may
  decide to revoke your app entirely from civitai.com. Your refresh-token
  call will return `invalid_grant`; handle that by sending the user back
  through `/authorize` (with messaging that explains why).
* **Never persist budget assumptions across sessions.** Users can change
  their cap any time; treat each spend call as the source of truth.

## When you don't need buzz scopes

If your app doesn't spend buzz on the user's behalf — e.g. a read-only
analytics dashboard, or one that submits work using **your own**
`client_credentials` token — don't request `AIServicesWrite`. Users won't
see the buzz-cap UI, and you skip a whole category of failure modes.

---

---
url: /orchestration/recipes/chat-completion.md
---

# Chat completion

`chatCompletion` routes text (and optionally image) inputs through large language models. Any model available on [OpenRouter](https://openrouter.ai/models) is supported, plus Civitai-hosted AIR models. The request and response shapes follow the OpenAI Chat Completions API.

## Access paths

Two ways to use chat completion, depending on your use case:

| Path | When to use |
|------|-------------|
| **`POST /v1/chat/completions`** | Drop-in replacement for the OpenAI API. Accepts `stream: true` for SSE streaming. |
| **`chatCompletion` workflow step** | Chain with other steps (`imageGen`, `convertImage`, etc.) in a multi-step workflow. |

Both paths share the same input schema and produce the same output format.

## Basic text completion

### Via the OpenAI-compatible endpoint

```http
POST https://orchestration.civitai.com/v1/chat/completions
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is the capital of France?" }
  ]
}
```

### Via SubmitWorkflow

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "chatCompletion",
    "input": {
      "model": "openai/gpt-4o-mini",
      "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "What is the capital of France?" }
      ]
    }
  }]
}
```

## Vision (image inputs)

Pass images in user message content parts. Any vision-capable model (e.g. `openai/gpt-4o`, `google/gemini-2.0-flash`) can process them.

```json
{
  "model": "openai/gpt-4o",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "text", "text": "Describe this image in detail." },
      {
        "type": "image_url",
        "image_url": {
          "url": "https://image.civitai.com/.../photo.jpeg",
          "detail": "auto"
        }
      }
    ]
  }],
  "max_tokens": 300
}
```

`detail` can be `"auto"` (default), `"low"`, or `"high"`. The image source can be a public URL, a data URL (`data:image/jpeg;base64,...`), or raw Base64 — the orchestrator uploads it to blob storage before dispatching the job.

## Image generation

Set `"modalities": ["image", "text"]` on the request to generate images through `/v1/chat/completions`. The response carries an `images` array on the assistant message, where each entry is a base64 data URI — the same shape OpenRouter uses, so existing OpenRouter-style SDK code works unmodified.

```http
POST https://orchestration.civitai.com/v1/chat/completions
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "model": "google/gemini-2.5-flash-image",
  "messages": [
    { "role": "user", "content": "A cat in a teacup, soft window light" }
  ],
  "modalities": ["image", "text"],
  "image_config": {
    "aspect_ratio": "1:1",
    "image_size": "1K"
  }
}
```

Response:

```json
{
  "id": "chatcmpl-...",
  "model": "google/gemini-2.5-flash-image",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "",
      "images": [{
        "type": "image_url",
        "image_url": { "url": "data:image/png;base64,iVBOR..." }
      }]
    },
    "finish_reason": "stop"
  }]
}
```

### Image editing (multi-turn)

Pass a prior generated image (or any image URL / data URI) as a content part on a user message and the request routes through the engine's edit operation:

```json
{
  "model": "google/gemini-2.5-flash-image",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "text", "text": "Make it a dog instead of a cat." },
      { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
    ]
  }],
  "modalities": ["image", "text"]
}
```

### Supported models

| `model` | Engine | Operations |
|---------|--------|------------|
| `google/gemini-2.5-flash-image` | Gemini 2.5 Flash Image | create, edit |
| `openai/gpt-image-1` | OpenAI gpt-image-1 | create, edit |
| `openai/dall-e-3` | OpenAI DALL·E 3 | create |
| `openai/dall-e-2` | OpenAI DALL·E 2 | create, edit |
| `black-forest-labs/flux.2-dev` | Flux 2 Dev | create, edit |
| `black-forest-labs/flux.2-flex` | Flux 2 Flex | create, edit |
| `black-forest-labs/flux.2-pro` | Flux 2 Pro | create, edit |
| `black-forest-labs/flux.2-max` | Flux 2 Max | create, edit |
| `black-forest-labs/flux.2-klein` | Flux 2 Klein | create, edit |

The provider prefix (`google/`, `openai/`, `black-forest-labs/`) is optional — short names like `gemini-2.5-flash-image`, `gpt-image-1`, `flux-2-dev` are also accepted. Unknown model names with `modalities: ["image"]` return `400` with the supported list.

### Civitai AIR URNs

Pass a Civitai [AIR](/site/guide/air) URN as `model` to use a community checkpoint. The ecosystem segment of the AIR (`sd1`, `sdxl`, `flux1`, `anima`) selects the engine; the AIR is forwarded as the checkpoint:

```json
{
  "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
  "messages": [{ "role": "user", "content": "A cyberpunk samurai" }],
  "modalities": ["image", "text"],
  "image_config": { "aspect_ratio": "1:1", "image_size": "1K" }
}
```

| Ecosystem | Engine | Operations | Notes |
|-----------|--------|------------|-------|
| `sd1` | SD 1.5 (sd-cpp) | create, variant | Pass an input image to trigger img2img variant. |
| `sdxl` | SDXL (sd-cpp) | create, variant | Same — img2img variant when an input image is supplied. |
| `flux1` | Flux 1 (sd-cpp) | create, edit | Edit operation accepts up to 2 input images; width/height clamped to 832–1216. |
| `anima` | Anima (sd-cpp) | create | Anima checkpoints; no img2img path through chat-completions. |

Other ecosystems (`zimage`, `qwen`, `wan`, `flux2`) hardcode their checkpoints — pass the matching named model instead (e.g. `flux-2-dev`) and use the [`imageGen` workflow step](./flux2) directly when you need to override the checkpoint.

### `image_config`

| Field | Values | Effect |
|-------|--------|--------|
| `aspect_ratio` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `21:9` | Sets width/height ratio. OpenAI engines snap to their nearest allowed size. Gemini ignores (always 1024×1024). |
| `image_size` | `0.5K`, `1K`, `2K`, `4K` | Approximate megapixel target. Engines clamp to their supported range. |
| `n` | 1–10 | Number of images. Falls back to the top-level `n`. Engines clamp to their supported max. |

For full per-engine knobs (samplers, LoRAs, guidance scales, advanced operations), use the [`imageGen` workflow step](./flux2) directly instead — chat-completions is a thin facade tuned for SDK compatibility, not a full passthrough of every engine parameter.

## Multi-turn conversations

Include prior turns as `assistant` messages to maintain context:

```json
{
  "model": "openai/gpt-4o-mini",
  "messages": [
    { "role": "system",    "content": "You are a concise assistant." },
    { "role": "user",      "content": "Write a haiku about the ocean." },
    { "role": "assistant", "content": "Waves crash endlessly,\nSalt..." },
    { "role": "user",      "content": "Now write one about mountains." }
  ],
  "temperature": 0.7
}
```

## Streaming

### Via `/v1/chat/completions`

Set `"stream": true` and handle Server-Sent Events (SSE). The response is a stream of `data: {...}` lines ending with `data: [DONE]`:

```http
POST https://orchestration.civitai.com/v1/chat/completions
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "model": "openai/gpt-4o-mini",
  "messages": [{ "role": "user", "content": "Tell me a short story." }],
  "stream": true
}
```

### Via workflow step

Set `stream: true` in the step `metadata` field:

```json
{
  "steps": [{
    "$type": "chatCompletion",
    "metadata": { "stream": true },
    "input": {
      "model": "openai/gpt-4o-mini",
      "messages": [{ "role": "user", "content": "Tell me a short story." }]
    }
  }]
}
```

When streaming is enabled, the orchestrator stores the raw NDJSON chunks in a streaming blob and assembles them into the standard `ChatCompletionOutput` shape for the workflow output.

## Tool use (function calling)

Define tools as JSON Schema function definitions. The model decides when and how to call them:

```json
{
  "model": "openai/gpt-4o",
  "messages": [
    { "role": "user", "content": "What is the weather in Paris?" }
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": { "type": "string" }
        },
        "required": ["city"]
      }
    }
  }],
  "tool_choice": "auto"
}
```

When the model calls a tool, the assistant message in the response contains a `tool_calls` array instead of (or alongside) `content`. Submit the tool result back as a `tool` message:

```json
{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 18, \"condition\": \"sunny\"}"
}
```

## Model selection

`model` accepts any string that identifies a model on OpenRouter or a Civitai AIR URI:

| Format | Example | Notes |
|--------|---------|-------|
| OpenRouter ID | `openai/gpt-4o-mini` | Any model from [openrouter.ai/models](https://openrouter.ai/models). |
| OpenAI shorthand | `gpt-4o`, `gpt-4o-mini` | OpenRouter also accepts bare OpenAI model names. |
| AIR URI | `urn:air:llm:model:civitai:<modelId>@<versionId>` | Routes to a Civitai-hosted model. |

## Parameters reference

| Field | Default | Notes |
|-------|---------|-------|
| `model` | — ✅ | Model ID (OpenRouter) or AIR URI. |
| `messages` | — ✅ | Array of role-discriminated messages (at least 1). |
| `temperature` | `1` | 0–2. Higher = more random output. |
| `topP` | `1` | 0–1. Nucleus sampling. Alternative to `temperature`; usually set one or the other. |
| `maxTokens` | `null` | Max output tokens, 1–128 000. Unlimited when omitted. |
| `n` | `1` | Number of completions to generate, 1–128. |
| `stop` | `null` | Up to 4 stop sequences. |
| `presencePenalty` | `0` | -2 to 2. Positive values discourage repeating topics. |
| `frequencyPenalty` | `0` | -2 to 2. Positive values discourage repeating exact tokens. |
| `seed` | `null` | Integer seed for deterministic output (beta). |
| `user` | `null` | End-user identifier for abuse monitoring. |
| `logprobs` | `null` | Return log probabilities for generated tokens. |
| `topLogprobs` | `null` | 0–20. Number of top log-prob candidates per token (requires `logprobs: true`). |
| `tools` | `null` | Function definitions available to the model. |
| `tool_choice` | `null` | `"auto"`, `"none"`, `"required"`, or `{ "type": "function", "function": { "name": "..." } }`. |
| `chatTemplateKwargs` | `null` | Extra kwargs passed to the model's chat template (vLLM-specific). |
| `modalities` | `null` | Output modalities. Include `"image"` to route the request through the image-generation pipeline. See [Image generation](#image-generation). |
| `imageConfig` | `null` | Image-generation parameters (`aspect_ratio`, `image_size`, `n`). Only consulted when `modalities` includes `"image"`. |

## Messages reference

Messages are discriminated by the `role` field:

### `system`

```json
{ "role": "system", "content": "You are a helpful assistant.", "name": "optional" }
```

### `user`

Content can be a plain string or an array of content parts:

```json
{ "role": "user", "content": "Plain text" }
```

```json
{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://...", "detail": "auto" } }
  ]
}
```

### `assistant`

```json
{ "role": "assistant", "content": "Prior response text." }
```

Or with tool calls (as returned by the model):

```json
{
  "role": "assistant",
  "content": null,
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": { "name": "get_weather", "arguments": "{\"city\":\"Paris\"}" }
  }]
}
```

### `tool`

```json
{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 18}"
}
```

## Reading the result

The output is an OpenAI-compatible `chat.completion` object:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "chatCompletion",
    "status": "succeeded",
    "output": {
      "id": "chatcmpl-...",
      "object": "chat.completion",
      "created": 1748000000,
      "model": "openai/gpt-4o-mini",
      "choices": [{
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "The capital of France is Paris."
        },
        "finish_reason": "stop"
      }],
      "usage": {
        "prompt_tokens": 24,
        "completion_tokens": 9,
        "total_tokens": 33
      }
    }
  }]
}
```

The `/v1/chat/completions` endpoint returns the `output` object directly (not wrapped in a workflow envelope).

## Cost

Cost depends on whether the model routes through OpenRouter or is a Civitai AIR model.

### OpenRouter models

Cost is computed from actual token usage with a **30% margin**, converted to Buzz (1 000 Buzz = 1 USD):

```
buzzCost = actualCostUsd × 1000 × 1.3   (minimum 1 Buzz)
```

Before execution, the orchestrator estimates cost using OpenRouter's published per-token prices. After execution, the final Buzz charge is based on the tokens actually consumed by the model.

Different models have very different per-token prices — check [openrouter.ai/models](https://openrouter.ai/models) for current pricing. Representative examples:

| Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical single call |
|-------|-----------------------|------------------------|---------------------|
| `openai/gpt-4o-mini` | $0.15 | $0.60 | < 1 Buzz |
| `openai/gpt-4o` | $2.50 | $10.00 | 2–15 Buzz |
| `anthropic/claude-3-5-sonnet` | $3.00 | $15.00 | 4–20 Buzz |
| `meta-llama/llama-3.3-70b-instruct` | $0.12 | $0.30 | < 1 Buzz |

Use `whatif=true` on your first request to get an exact preview before committing.

### AIR models (Civitai-hosted)

Flat-rate pricing based on image count and number of completions requested:

```
total = 1 × (imageCount × 2) × n
```

::: warning Known limitation
For text-only requests to AIR models (`imageCount = 0`), the `images` factor collapses the product to **0 Buzz**. This is a known bug — expect it to be corrected in a future release. For now, AIR model text-only calls cost 0 Buzz.
:::

## Runtime

Most chat completions finish in 5–30 seconds depending on model and output length. Use `wait=60` for simple requests; add `wait=0` + polling for long outputs, large `n`, or slow models. The `/v1/chat/completions` endpoint waits up to 60 seconds before timing out with `504`.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "messages must not be empty" | Empty `messages` array | Include at least one message. |
| `400` with "model is required" | Missing `model` field | `model` is always required. |
| `504 Gateway Timeout` (via `/v1`) | Slow model or long output | Retry with `wait=0` via `SubmitWorkflow` + polling. |
| `400` with "topLogprobs requires logprobs" | Sent `topLogprobs` without `logprobs: true` | Set `"logprobs": true` alongside `topLogprobs`. |
| Response truncated mid-sentence | `maxTokens` reached | Raise `maxTokens` or omit it to let the model decide. |
| Tool call in response instead of content | Expected behaviour | The model chose to call a tool — feed the `tool_calls` back as a `tool` message in the next turn. |
| Step `failed`, `reason = "no_provider_available"` | AIR model offline or no worker available | Retry shortly. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — for the workflow-step path
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [Prompt enhancement](./prompt-enhancement) — uses a `chatCompletion`-like step to rewrite image prompts
* [Image conversion](./convert-image) — 1-Buzz utility step to post-process generated images

---

---
url: /orchestration/recipes/training-other-image.md
---

# Chroma / ERNIE / Qwen / Z-Image LoRA training

Five smaller image-LoRA ecosystems share this page: each has its own `ecosystem` value and base checkpoint, but the request shape is otherwise the AI Toolkit standard.

| `ecosystem` | Base | Buzz / epoch | Best for |
|-------------|------|--------------|----------|
| `chroma` | `lodestones/Chroma1-HD` | 200 | Chroma community model fine-tunes |
| `ernie` | `baidu/ERNIE-Image` | 100 | ERNIE Image LoRAs |
| `qwen` | Qwen-Image (versioned) | 200 | Qwen Image / Qwen-Image-Edit LoRAs |
| `zimageturbo` | `ostris/Z-Image-De-Turbo` (+ Z-Image-Turbo extras) | 100 | Z-Image Turbo LoRAs (cheap, fast inference) |
| `zimagebase` | `Tongyi-MAI/Z-Image` | 100 | Z-Image base LoRAs |

Each ecosystem has its own subsection with a runnable example. The shared schema lives in [Common parameters](#common-parameters); ecosystem-specific quirks are in each subsection.

::: tip Long-running step
Always submit with `wait=0`. These ecosystems run anywhere from ~10s/epoch (Z-Image Turbo) to ~2min/epoch (Chroma/Qwen). See [Results & webhooks](/orchestration/guide/results-and-webhooks).
:::

## The request shape

```json
{
  "$type": "training",
  "input": {
    "engine":    "ai-toolkit",
    "ecosystem": "chroma"   // chroma | ernie | qwen | zimageturbo | zimagebase
  }
}
```

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A training-data zip (signed R2 URL, Civitai R2 AIR, or any HTTPS URL)
* An accurate `count` of images in the zip

## Chroma

Trains on the Chroma1-HD base. Uses [`TextToImageV2Job`](/orchestration/reference/) for sample renders; output LoRA is usable wherever Chroma is supported.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "chroma",
      "epochs": 5,
      "resolution": 1024,
      "lr": 0.0001,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 16,
      "networkAlpha": 16,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "https://civitai-delivery-worker-prod.5ac0637cfd0766c97916cefa3764fbdf.r2.cloudflarestorage.com/training-images/5418/2382561TrainingData.B6Tr.zip",
        "count": 10
      },
      "samples": {
        "prompts": [
          "woman with red hair, playing chess at the park, dramatic explosion in background",
          "a woman holding a coffee cup, in a beanie, sitting at a cafe",
          "a horse acting as a DJ at a night club, fisheye lens, smoke machine, laser lights"
        ]
      }
    }
  }]
}
```

Chroma defaults: `networkDim: 16`, `optimizerType: adamw8bit`, `trainTextEncoder: false`, `lrScheduler: cosine`. 200 Buzz / epoch.

## ERNIE

Trains on Baidu's ERNIE-Image. Comfy-based ecosystem with built-in diffuser. Uses [`ComfyImageGenJob`](/orchestration/reference/) for sample renders.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "ernie",
      "epochs": 5,
      "lr": 0.0001,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/7918795/2435272TrainingData.bJ7P.zip",
        "count": 10
      },
      "samples": {
        "prompts": ["a portrait of TOK", "TOK walking through a comic book city"]
      }
    }
  }]
}
```

ERNIE defaults: `networkDim: 32`, `optimizerType: adamw8bit`, `trainTextEncoder: false`, `lrScheduler: cosine`. 100 Buzz / epoch.

## Qwen

Trains on Qwen-Image. The `version` field selects a specific Qwen-Image release:

| `version` | Base resolved to |
|-----------|------------------|
| `latest` (default) | `Qwen/Qwen-Image-Edit-2512` |
| `2509` | `urn:air:qwen:checkpoint:civitai:1864281@2110043` |
| `2512` | `Qwen/Qwen-Image-Edit-2512` (same as `latest`) |

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "qwen",
      "version": "latest",
      "epochs": 1,
      "resolution": 1024,
      "lr": 0.00011,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 16,
      "networkAlpha": 16,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/3315022/2526079TrainingData.o4S8.zip",
        "count": 10
      },
      "samples": {
        "prompts": [
          "woman with red hair, playing chess at the park, dramatic explosion in background",
          "a woman holding a coffee cup, in a beanie, sitting at a cafe"
        ]
      }
    }
  }]
}
```

Qwen defaults: `networkDim: 16`, `optimizerType: adamw8bit`, `trainTextEncoder: false`, `lrScheduler: cosine`. 200 Buzz / epoch.

## Z-Image Turbo

Trains on `ostris/Z-Image-De-Turbo` and pulls in the original `Tongyi-MAI/Z-Image-Turbo` as an extras model. Output LoRA is usable in [Z-Image generation](./zimage) on the `turbo` model.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "zimageturbo",
      "epochs": 7,
      "resolution": 512,
      "lr": 0.000611,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/3315022/2526079TrainingData.o4S8.zip",
        "count": 10
      },
      "samples": {
        "prompts": ["a photo of TOK", "TOK in a garden", "TOK portrait"]
      }
    }
  }]
}
```

Z-Image Turbo defaults: `networkDim: 32`, `optimizerType: adamw8bit`, `trainTextEncoder: false`. 100 Buzz / epoch.

## Z-Image Base

Trains on `Tongyi-MAI/Z-Image`. The orchestrator overrides `optimizerType` to `automagic` and `lr` to `0.000001` regardless of what you submit — the input fields are accepted but ignored. Use the [Z-Image Turbo](#z-image-turbo) recipe instead unless you specifically need a base-model LoRA.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "zimagebase",
      "epochs": 7,
      "resolution": 512,
      "lr": 0.000611,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/3315022/2526079TrainingData.o4S8.zip",
        "count": 10
      },
      "samples": {
        "prompts": ["a photo of TOK", "TOK in a garden", "TOK portrait"]
      }
    }
  }]
}
```

Z-Image Base defaults: `networkDim: 32`, `optimizerType: automagic` (overridden), `lr: 0.000001` (overridden), `trainTextEncoder: false`. 100 Buzz / epoch.

## Common parameters {#common-parameters}

Defaults shown are the post-`ApplyDefaults` values; per-ecosystem deviations are noted above.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `engine` | ✅ | — | Always `ai-toolkit`. |
| `ecosystem` | ✅ | — | One of: `chroma`, `ernie`, `qwen`, `zimageturbo`, `zimagebase`. |
| `version` | (qwen only) | `latest` | `latest`, `2509`, `2512`. Selects the Qwen-Image base release. |
| `epochs` | | `5` | `1`–`20`. Billed per epoch. |
| `numberOfRepeats` | | varies (see ecosystem) | `1`–`5000`. ERNIE / Z-Image auto-derive `ceil(200 / count)`; Chroma / Qwen don't auto-set. |
| `lr` | | `0.0001` | UNet learning rate. |
| `trainTextEncoder` | | `false` | All five ecosystems leave the text encoder frozen. |
| `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. |
| `optimizerType` | | `adamw8bit` (`automagic` for Z-Image Base) | Full enum on the [SDXL/SD1 page](./training-sdxl-sd1#common-parameters). |
| `networkDim` | | `32` (`16` for Chroma / Qwen) | `1`–`256`. |
| `networkAlpha` | | matches `networkDim` | `1`–`256`. |
| `noiseOffset` | | `0` | `0`–`1`. |
| `flipAugmentation` | | `false` | Random horizontal flips. |
| `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. |
| `triggerWord` | | *(none)* | Activation token. Recommended for character / style LoRAs on Chroma, Z-Image. |
| `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. |
| `samples.prompts[]` | | `[]` | Per-epoch preview prompts rendered with the trained LoRA. |
| `samples.negativePrompt` | | *(none)* | — |

## Reading the result

Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a `.safetensors` LoRA blob plus any sample images.

The trained LoRA is usable in the corresponding generation recipe — Chroma LoRAs in any Chroma workflow, ERNIE LoRAs in [ERNIE image generation](./ernie), Qwen LoRAs in [Qwen image generation](./qwen), Z-Image LoRAs in [Z-Image generation](./zimage).

## Runtime

Per-epoch wall time, default settings on a 10-image dataset:

| Ecosystem | Per-epoch | Typical full run |
|-----------|-----------|-------------------|
| `chroma` | ~60–120 s | 5–15 min for 5 epochs |
| `ernie` | ~30–60 s | 3–8 min for 5 epochs |
| `qwen` | ~60–120 s | 5–15 min for 5 epochs |
| `zimageturbo` | ~10–25 s | 1–4 min for 7 epochs |
| `zimagebase` | ~10–25 s | 1–4 min for 7 epochs |

Always use `wait=0`.

## Cost

```
total = costPerEpoch × epochs
```

| Ecosystem | Buzz / epoch | `epochs: 5` | `epochs: 10` |
|-----------|--------------|-------------|--------------|
| `chroma` | 200 | 1000 | 2000 |
| `ernie` | 100 | 500 | 1000 |
| `qwen` | 200 | 1000 | 2000 |
| `zimageturbo` | 100 | 500 | 1000 |
| `zimagebase` | 100 | 500 | 1000 |

Sample-prompt rendering is billed separately at each ecosystem's image-generation rate. Use `whatif=true` (the **Preview cost** button on the widgets above) to confirm exact charges before submitting.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "ecosystem unknown" | Typo, or not one of `chroma` / `ernie` / `qwen` / `zimageturbo` / `zimagebase` | Check spelling. |
| `400` with "version not allowed" (Qwen only) | `version` not one of `latest` / `2509` / `2512` | Use one of the listed values. |
| Z-Image Base: `optimizerType` you set seems ignored | Intentional — `ApplyDefaults` overrides to `automagic` | Use Z-Image Turbo if you need full optimizer control. |
| Trained LoRA underbaked | Too few epochs / too low `lr` | Raise `epochs` to 8–15 (these ecosystems often need more epochs than SDXL); keep `lr` ≤ `5e-4`. |
| Trained LoRA overcooked | Too many epochs or `networkDim` too high | Drop `networkDim` to 16, lower `epochs`. |
| Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged images. |

## Related

* [SDXL & SD1 LoRA training](./training-sdxl-sd1) — classic Stable Diffusion ecosystems
* [Flux 1 LoRA training](./training-flux1) / [Flux 2 Klein LoRA training](./training-flux2-klein) — Flux family
* [Wan video LoRA training](./training-wan) / [LTX2 video LoRA training](./training-ltx2) — video LoRAs
* Generation recipes for these ecosystems: [Z-Image](./zimage), [Qwen](./qwen), [ERNIE](./ernie)
* [Results & webhooks](/orchestration/guide/results-and-webhooks)
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow)
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml)

---

---
url: /site.md
description: 'REST API for browsing models, images, creators, and tags on civitai.com.'
---

# Civitai Site API

The Civitai site exposes a public REST API at `https://civitai.com/api/v1/...` for
browsing models, model versions, images, creators, and tags. It's the same
surface that powers third-party tools like Stable Diffusion downloaders and
metadata lookup utilities.

This is **not** the Orchestration API. If you want to *submit* generation work,
see the [Orchestration docs](/orchestration/).

## Where to start

* **[Guide](./guide/)** — authentication, pagination, error handling, and the
  AIR (AI Resource Identifier) format.
* **[Reference](./reference/)** — per-resource documentation for every public
  endpoint, sourced directly from the current Next.js handlers.

## Quick example

```bash
# Public — no auth required
curl "https://civitai.com/api/v1/models?limit=1&types=LORA"

# Authenticated — pass a Civitai API token
curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/me"
```

See [Getting started](./guide/getting-started) for a full walkthrough.

---

---
url: /site/reference/creators.md
description: List Civitai creators.
---

# Creators

Creators are users who have published at least one model on Civitai.

## List creators

```
GET /api/v1/creators
```

**Auth:** Public.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `limit` | integer (1–200) | 20 | Number of items per page. |
| `page` | integer (≥ 1) | 1 | 1-indexed page number. |
| `query` | string | — | Full-text search on username. |

Pagination is page-based only — there is no `cursor` parameter on this
endpoint.

### Response

```json
{
  "items": [
    {
      "username": "JustMaier",
      "modelCount": 3,
      "link": "https://civitai.com/api/v1/models?username=JustMaier",
      "image": "https://image.civitai.com/.../JustMaier.jpeg"
    }
  ],
  "metadata": {
    "totalItems": 84916,
    "currentPage": 1,
    "pageSize": 1,
    "totalPages": 84916,
    "nextPage": "https://civitai.com/api/v1/creators?limit=1&page=2"
  }
}
```

### Field notes

* `link` is pre-built — follow it to list a creator's models via
  [`GET /models`](./models#list-models).
* `modelCount` is only included when greater than zero; creators with no
  published models are excluded from the listing entirely.
* `image` is null when the creator has no avatar.

### Notes

* For very deep traversals, scope with `?query=` rather than paging linearly —
  the listing is sorted alphabetically by username, so `query=A`, `query=B`,
  ... is a reliable way to walk the full set.

### Examples

```bash
# First page
curl "https://civitai.com/api/v1/creators?limit=20"

# Find a specific creator
curl "https://civitai.com/api/v1/creators?query=JustMaier"
```

---

---
url: /site/reference/enums.md
description: Valid enum values used across the Civitai site API.
---

# Enums

```
GET /api/v1/enums
```

**Auth:** Public.

Returns the current set of enum values used elsewhere in the site API — model
types, file types, base models, and their sub-types. Call this endpoint to
discover valid values for query params like `types=` and `baseModels=` on
[`GET /models`](./models), rather than hardcoding lists.

### Response

```json
{
  "ModelType": [
    "Checkpoint", "TextualInversion", "Hypernetwork", "AestheticGradient",
    "LORA", "LoCon", "DoRA", "Controlnet", "Upscaler", "MotionModule",
    "VAE", "Poses", "Wildcards", "Workflows", "Detection", "Other"
  ],
  "ModelFileType": [
    "Model", "Text Encoder", "Pruned Model", "Negative",
    "Training Data", "VAE", "Config", "Archive"
  ],
  "ActiveBaseModel": [
    "Flux.1 D", "Flux.2 D", "SDXL 1.0", "Illustrious",
    "Qwen", "Wan Video 2.2 T2V-A14B", "ZImageTurbo", "..."
  ],
  "BaseModel": [
    "SD 1.5", "SD 2.1", "SD 3.5", "SDXL 1.0", "Flux.1 D",
    "Illustrious", "Pony", "Hunyuan Video", "..."
  ],
  "BaseModelType": [
    "Standard", "Inpainting", "Refiner", "Pix2Pix"
  ]
}
```

Only the shape is guaranteed above — the list contents change as Civitai
adds support for new model families. Always fetch live values rather than
baking them into clients.

### Key distinctions

* **`ModelType`** — the kind of artifact (checkpoint vs. LoRA vs. VAE, etc.). Use as the `types=` filter on `GET /models`.
* **`ModelFileType`** — the role of a file *within* a model version (main model, VAE, text encoder, training data). Appears as `files[].type`.
* **`BaseModel`** — every base model Civitai has ever catalogued. Use as `baseModels=` when filtering.
* **`ActiveBaseModel`** — the subset of `BaseModel` that Civitai's on-site generation currently supports. If you're building around Orchestration workflows, filter to these.
* **`BaseModelType`** — sub-classification of a base model (e.g. Standard vs. Inpainting SDXL). Appears as `baseModelType` on model versions.

### Example

```bash
curl "https://civitai.com/api/v1/enums" | jq '.ModelType'
```

---

---
url: /orchestration/recipes/ernie.md
---

# ERNIE image generation

Baidu's ERNIE Image is a distillation-friendly text-to-image family hosted on Civitai's Comfy workers. Single engine path, one operation (`createImage` — no img2img, variant, or edit support), two model variants that differ only in speed vs. quality:

* `engine: "comfy"`, `ecosystem: "ernie"`
* **Only `createImage`** — ERNIE doesn't expose `createVariant` or `editImage`. Use [Flux 2 Klein](./flux2#klein-createvariant-img2img) or [Qwen](./qwen) if you need img2img or prompt-driven editing.
* Built-in diffuser, VAE, and text encoder — no `model` URN to pick. The only `model` field is the variant selector (`ernie` or `turbo`).
* LoRA support (ERNIE-tagged LoRAs only).

## Variants

| `model` | Steps (default) | `cfgScale` (default) | Best for |
|---------|-----------------|----------------------|----------|
| `ernie` | `20` | `4` | **Default** — full-quality output, standard sampling |
| `turbo` | `8` | `1` | Distilled for speed — 3–4× faster and ~⅓ the Buzz per image; use for drafts, batches, and iteration |

Leave `cfgScale: 1` on `turbo` — it's a distilled model and doesn't respond to classifier-free guidance the way the standard variant does.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* No checkpoint URN required — both variants ship with built-in diffuser / VAE / text encoder

## Standard (`model: "ernie"`)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "comfy",
      "ecosystem": "ernie",
      "model": "ernie",
      "operation": "createImage",
      "prompt": "A red panda wearing a yellow rain jacket, cinematic soft light, highly detailed",
      "width": 1024,
      "height": 1024,
      "steps": 20,
      "cfgScale": 4,
      "sampler": "euler",
      "scheduler": "simple",
      "quantity": 1
    }
  }]
}
```

### Parameters

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `prompt` | — ✅ | ≤ 10 000 chars | Natural-language descriptions work well; ERNIE handles complex scenes better than tag-soup. |
| `negativePrompt` | *(none)* | ≤ 10 000 chars | Optional. Shorter is usually better — ERNIE's defaults are already clean. |
| `width` / `height` | `1024` | `64`–`2048`, divisible by 16 | Trained around 1024². Well-behaved near ~1 megapixel total. |
| `steps` | `20` | `1`–`150` | Diminishing returns past ~25. |
| `cfgScale` | `4` | `0`–`30` | `3`–`5` is the sweet spot. |
| `sampler` | `euler` | enum | [`ComfySampler`](/orchestration/reference/). `euler` is what the model was tuned against. |
| `scheduler` | `simple` | enum | [`ComfyScheduler`](/orchestration/reference/). |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple. Only `urn:air:ernie:lora:...` LoRAs work here. |
| `quantity` | `1` | `1`–`12` | Number of images per call. |
| `seed` | random | int64 | Pin for reproducibility. |

### Portrait aspect ratio

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "comfy",
      "ecosystem": "ernie",
      "model": "ernie",
      "operation": "createImage",
      "prompt": "Portrait of a woman with flowing hair standing in a blooming cherry blossom field, golden hour lighting",
      "negativePrompt": "worst quality, blurry, low resolution",
      "width": 832,
      "height": 1216,
      "steps": 20,
      "cfgScale": 4,
      "sampler": "euler",
      "scheduler": "simple",
      "seed": 42
    }
  }]
}
```

## Turbo (`model: "turbo"`)

Distilled variant — same input surface as standard, just lower defaults for `steps` and `cfgScale`. Use this as the default when you're iterating on prompts or generating batches; fall back to `ernie` for hero shots.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "comfy",
      "ecosystem": "ernie",
      "model": "turbo",
      "operation": "createImage",
      "prompt": "A red panda wearing a yellow rain jacket, cinematic soft light, highly detailed",
      "width": 1024,
      "height": 1024,
      "steps": 8,
      "cfgScale": 1,
      "sampler": "euler",
      "scheduler": "simple",
      "quantity": 1
    }
  }]
}
```

Turbo-specific tuning:

| Field | Default | Notes |
|-------|---------|-------|
| `steps` | `8` | Stay in `6`–`12`. Pushing past `~16` wastes Buzz without improving output on the distilled model. |
| `cfgScale` | `1` | Distilled — leave at `1`. Raising it usually over-saturates / burns the output. |

Everything else (`prompt`, `negativePrompt`, dimensions, `sampler`, `scheduler`, `seed`, `quantity`, `loras`) matches the standard variant.

## Reading the result

ERNIE emits the standard `imageGen` output — an `images[]` array, one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "$type": "imageGen",
    "name": "$0",
    "status": "succeeded",
    "output": {
      "images": [
        {
          "id": "aa6e7228-68cd-4d15-b4d7-5005b2bfbac6-0.jpg",
          "width": 1024,
          "height": 1024,
          "url": "https://orchestration.civitai.com/v2/consumer/blobs/…?sig=…",
          "urlExpiresAt": "2027-04-15T17:18:54.3195353Z",
          "previewUrl": "https://orchestration.civitai.com/v2/consumer/blobs/…?sig=…",
          "previewUrlExpiresAt": "2027-04-15T17:18:54.3196735Z",
          "available": true,
          "nsfwLevel": "pg13"
        }
      ],
      "errors": []
    }
  }]
}
```

`url` and `previewUrl` are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. The `nsfwLevel` field carries the moderation classification applied to the output.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Per-pixel + per-step scaling against 1024² and the variant's default step count:

**Standard** (`ComfyErnieStandardCreateImageGenInput.CalculateCost`):

```
total = 20 × (width × height / 1024²) × (steps / 20) × quantity
```

**Turbo** (`ComfyErnieTurboCreateImageGenInput.CalculateCost`):

```
total = 8 × (width × height / 1024²) × (steps / 8) × quantity
```

| Shape | Standard (Buzz) | Turbo (Buzz) |
|-------|-----------------|--------------|
| 1024² / default steps / `quantity: 1` (defaults) | **20** | **8** |
| 832×1216 / default steps / `quantity: 1` | ~20 | ~8 |
| 1024² / default steps / `quantity: 4` | ~80 | ~32 |
| 1024² / `steps: 40` (standard) / `steps: 16` (turbo) | ~40 | ~16 |

Standard pricing is ~2.5× turbo at defaults — reach for turbo when iterating on prompts.

## Runtime

Claim duration (`job.startedAt` → `job.completedAt`) measured against `orchestration-next` with `quantity: 1`:

| Variant | Shape | Claim duration |
|---------|-------|----------------|
| `ernie` (standard) | 1024² / 20 steps | ~29 s |
| `ernie` (standard) | 832×1216 / 20 steps | ~27 s |
| `turbo` | 1024² / 8 steps | ~13 s |

`wait=60` covers single-image calls comfortably. For `quantity > 1`, larger dimensions, or high `steps` counts, compute + queue wait typically runs past the 60 s long-poll ceiling — submit with `wait=60` and re-issue `GET /v2/consumer/workflows/{id}?wait=60` on a loop until the response is terminal (see [Submitting work → Waiting for results](/orchestration/guide/submitting-work#waiting-for-results)), or register a webhook.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "No derived type found for discriminator value 'ernie'" on `ecosystem` | ERNIE not yet rolled out to the environment you're hitting | Confirm the target orchestrator lists `ernie` in `/openapi/v2-consumers.json` → `ComfyImageGenInput` `ecosystem` enum. Retry after rollout. |
| `400` with "operation must be createImage" | Passed `editImage` or `createVariant` | ERNIE only supports `createImage`. Use [Qwen](./qwen) or [Flux 2 Klein](./flux2#klein-createvariant-img2img) for img2img / edit. |
| `400` on `model` | Sent a full AIR URN or a value other than `ernie` / `turbo` | The `model` field is a variant selector, not a checkpoint URN. Only `"ernie"` and `"turbo"` are valid. |
| `400` on `width` / `height` | Value not divisible by 16, or outside `64`–`2048` | Round to a valid multiple of 16 inside that range. |
| Turbo output looks over-saturated / blown out | `cfgScale > 1` on the distilled model | Set `cfgScale: 1` for turbo. Raise `steps` instead if you want more fidelity. |
| Standard output ignores the prompt | `cfgScale` too low | Bump toward `4`–`6`. `cfgScale: 1` on standard barely steers the model. |
| LoRA silently has no effect | Wrong AIR URN, or ecosystem mismatch | Only `urn:air:ernie:lora:…` LoRAs work here. Verify the URN on the LoRA's Civitai page. |
| Request timed out (`wait` expired) | Large `quantity`, atypical dimensions, or high `steps` | Resubmit and resume with a `GET …?wait=60` loop, or register a webhook. |
| Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Qwen image generation](./qwen) — alternative with edit + variant operations and LoRA support
* [Flux 2 image generation](./flux2) — higher-fidelity general-purpose alternative with `createVariant`
* [Z-Image generation](./zimage) — the other distilled, extremely cheap + fast image recipe (sdcpp-based)
* [Anima image generation](./anima) — anime-tuned sdcpp ecosystem, same single-operation shape
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: `ComfyErnieStandardCreateImageGenInput` and `ComfyErnieTurboCreateImageGenInput` in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /site/guide/errors.md
description: Error response shape and HTTP status codes used by the Civitai site API.
---

# Errors

## Response shape

Most errors come back as a single-field JSON object:

```json
{ "error": "descriptive message" }
```

Some errors originate inside the internal tRPC layer and get forwarded with a
richer shape:

```json
{
  "code": "UNAUTHORIZED",
  "message": "descriptive message",
  "issues": [ /* optional Zod validation details */ ]
}
```

Either way, inspect the HTTP status code first and use the body for a
human-readable explanation.

## Status codes

| Status | Meaning | Typical cause |
|--------|---------|---------------|
| **200** | OK | Successful read. |
| **400** | Bad Request | Invalid query parameters. For list endpoints, response body includes Zod-style validation issues. Combining `?query=` with `?page=` also returns 400. |
| **401** | Unauthorized | Missing or invalid token on an `Authenticated` endpoint, or an auth-only filter (e.g. `?favorites=true`) on a `Mixed` endpoint without a session. |
| **403** | Forbidden | Valid token, but the user is not permitted to access the resource. |
| **404** | Not Found | Unknown model, version, or hash. Body shape: `{"error": "No model with id 0"}`. |
| **405** | Method Not Allowed | Wrong HTTP verb for the endpoint. |
| **429** | Too Many Requests | Either edge rate limiting (Cloudflare) or the `page * limit > 1000` pagination cap — see [Pagination](./pagination). |
| **500** | Internal Server Error | Unexpected failure. Safe to retry with backoff. |

## Retries

The API does not expose a `Retry-After` header for most failures. For 5xx and
429 responses, apply exponential backoff starting at ~1 second and cap at
\~30 seconds. Don't retry 4xx responses other than 429 — the request shape
itself is the problem.

## Rate limits

There is no per-endpoint rate limit exposed as a stable contract. Cloudflare
enforces edge limits (generic DDoS / abuse protection) in front of the API;
those are operational, not a published SLA. Treat unexpected 429s as a signal
to back off, not as a scheme to code against.

---

---
url: /orchestration/guide/errors-and-retries.md
---

# Errors & Retries

Errors surface in two places:

1. **HTTP responses** on the submit / get / update endpoints — validation, auth, rate-limit, or server issues.
2. **Step status** inside a returned workflow — the request succeeded, but a step reached `failed`, `expired`, or `canceled`.

The orchestrator races providers internally, so transient provider failures (one worker crashing, one region being slow) are usually invisible to you — a different provider claims the job. You only see step-level failures once every viable provider has been exhausted.

## HTTP error shape

Validation and error responses use the standard [RFC 7807](https://www.rfc-editor.org/rfc/rfc7807) `application/problem+json` shape ([`ProblemDetails`](/orchestration/reference/operations/SubmitWorkflow) / [`ValidationProblemDetails`](/orchestration/reference/operations/SubmitWorkflow)):

```json
{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.5.1",
  "title": "One or more validation errors occurred.",
  "status": 400,
  "errors": {
    "steps[0].input.resolution": [
      "The value '4k' is not valid for resolution."
    ]
  }
}
```

Fields:

* `status` — the HTTP status code (also in the response line).
* `title` — short, human-readable summary.
* `detail` — longer description when the orchestrator can give one.
* `errors` — present on `400` validation failures; JSON Pointer–style paths → list of messages.
* `instance` / `type` — optional URIs identifying the specific occurrence and error class.

## HTTP status taxonomy

| Code | Meaning | Retry? |
|------|---------|--------|
| `400 Bad Request` | Body failed validation. `errors` map tells you exactly which fields. | **No** — fix the request. |
| `401 Unauthorized` | Missing / expired / malformed bearer token. | **No** — obtain a new token. |
| `403 Forbidden` | Token is valid but can't perform this operation (recipe not enabled for your tier, mature content not permitted, etc.). | **No** — request access; don't retry as-is. |
| `404 Not Found` | Workflow or blob ID doesn't exist (or the token can't see it). | **No** — check the ID. |
| `409 Conflict` | The workflow is in a state that blocks this mutation (e.g. updating a step that already started). | **Maybe** — refetch and reconcile. |
| `429 Too Many Requests` | Rate-limit hit. | **Yes, with backoff.** |
| `5xx Server Error` | Transient orchestrator issue. | **Yes, with backoff.** |

Retry guidance for `429` / `5xx`: exponential backoff with jitter, capped at ~30 s between attempts, give up after ~5 tries. Don't retry `400` / `401` / `403` / `404` until you've fixed the underlying issue.

## Step-level failures

A `200` / `202` on submit means the workflow was accepted — individual steps can still fail later. The [`Workflow`](/orchestration/reference/operations/GetWorkflow) payload you get back (or that arrives by webhook) carries per-step status:

```json
{
  "id": "wf_01HXYZ...",
  "status": "failed",
  "steps": [
    {
      "name": "0",
      "$type": "videoGen",
      "status": "failed",
      "jobs": [
        {
          "status": "failed",
          "reason": "no_provider_available",
          "blockedReason": null
        }
      ]
    }
  ]
}
```

Workflow / step / job statuses share the same enum: `unassigned`, `preparing`, `scheduled`, `processing`, `succeeded`, `failed`, `expired`, `canceled`. Terminal states are `succeeded`, `failed`, `expired`, `canceled` — once reached, [they do not change](./results-and-webhooks#delivery-semantics).

The `reason` and `blockedReason` fields on failed jobs are the best hint at *why*:

| `reason` | What it means | Your move |
|----------|---------------|-----------|
| `no_provider_available` | No provider can run this job with the given inputs (unusual resolution, unsupported duration, restricted region, etc.). | Relax inputs, try another `provider`/`version`, or retry later. |
| `blocked` | The job was blocked by content moderation. `blockedReason` explains further. | Don't retry the same input; rework the prompt or image. |
| `timeout` / `expired` | Job exceeded its internal deadline. | Safe to resubmit — possibly with a smaller workload. |
| `canceled` | Someone (you or an operator) canceled the workflow via [`DeleteWorkflow`](/orchestration/reference/operations/DeleteWorkflow). | No retry unless you actually want to re-run it. |

When `reason` is absent, the failure is generic — safe to retry once with the same body.

## Webhook retries

If you've registered callbacks, the orchestrator retries transient failures on your endpoint automatically. See [Results & webhooks → Delivery semantics](./results-and-webhooks#delivery-semantics) for the serialization guarantees you can rely on.

## Common gotchas

* **Blob URLs 403 after a few minutes.** The signed URL expired — refetch the workflow (or call [`GetBlob`](/orchestration/reference/operations/GetBlob)) for a fresh one. This isn't a real failure.
* **`202` after `wait=90`.** The workflow didn't finish within the [100-second request timeout](./getting-started#_3-poll-if-you-didn-t-wait-inline). Expected for video / training / large-batch jobs — continue via webhooks or polling.
* **Step `canceled` unexpectedly.** Check whether another process called [`DeleteWorkflow`](/orchestration/reference/operations/DeleteWorkflow). The orchestrator itself only cancels on explicit request or when a dependent step already failed.

---

---
url: /orchestration/recipes/flux1.md
---

# Flux 1 image generation

Flux 1 is Black Forest Labs' original open-weights family (Dev / Schnell plus the commercial Kontext tier). The whole family is the **`flux1` ecosystem** on the orchestrator — same checkpoint family, same AIR prefix (`urn:air:flux1:…`), same resource pool for workers and capability matching. What differs is how you *invoke* it: there's no single `engine: "flux1"` discriminator, so you pick one of three `engine` values depending on what you want:

| `engine` | Best for | Notes |
|----------|----------|-------|
| `sdcpp` (ecosystem `flux1`) | **Default** — Stable Diffusion C++ on Civitai workers | Only `diffuserModel` is required; VAE / CLIP-L / T5-XXL default to sensible components. Supports LoRAs, `createImage` / `createVariant` / `editImage`. |
| `comfy` (ecosystem `flux1`) | When you specifically need ComfyUI sampler knobs | Full sampler/scheduler enum control, LoRA support, checkpoint via AIR URN. Picks a heavier worker than sdcpp — reach for this only if you need a Comfy-specific sampler. |
| `flux1-kontext` (ecosystem `flux1`) | Image editing / prompt-based edits via BFL's managed Kontext API | `dev` / `pro` / `max` tiers; the `ecosystem` field isn't in the request body but the endpoint lives in the same ecosystem internally |

**Default choice for new integrations**: `engine: "sdcpp"`, `ecosystem: "flux1"`. Sdcpp's defaults handle the component models for you, so you only need to pick a diffuser. Reach for `comfy` when you need a specific Comfy sampler; use `flux1-kontext` when you want BFL's managed editor.

If you're starting fresh and don't need Flux.1 specifically, consider [Flux 2](./flux2) — cleaner schema, better quality, same orchestration-side usage.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A Flux.1 diffuser AIR URN (for sdcpp / comfy paths) — browse the [Civitai Flux 1.D catalog](https://civitai.com/models?baseModels=Flux.1+D)
* For `createVariant` / `editImage` / Kontext editing: one or more source image URLs

## sdcpp (default path)

Runs Flux.1 on Civitai's sdcpp workers. Minimal required input — just pick a diffuser and write a prompt. Every other model component (VAE, CLIP-L, T5-XXL) has a working default; LoRAs, samplers, and dimensions are tunable.

### Text-to-image

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "flux1",
      "operation": "createImage",
      "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639",
      "prompt": "A photorealistic portrait of a woman in a cyberpunk city, neon reflections",
      "width": 1024,
      "height": 1024
    }
  }]
}
```

Common sdcpp-flux1 parameters:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `diffuserModel` | — ✅ | AIR URN | The only required model component. A Flux.1 diffuser from the catalog. |
| `prompt` | — ✅ | ≤ 1000 chars | Natural-language descriptions work best on Flux. |
| `width` / `height` | `1024` | `832`–`1216`, divisible by 16 | Tighter than Comfy's `64`–`2048`. |
| `steps` | `28` | `4`–`50` | Sampler steps. Diminishing returns past ~30. |
| `cfgScale` | `3.5` | `1`–`20` | Classifier-free guidance. `2.5`–`4` is the sweet spot for Flux. |
| `sampleMethod` | `euler` | enum | See [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | `simple` | enum | See [`SdCppSchedule`](/orchestration/reference/). |
| `negativePrompt` | *(none)* | string | Available — Comfy/Kontext flux1 variants don't expose one. |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; strengths in `0.0`–`2.0` are typical. |
| `quantity` | `1` | `1`–`4` | Number of images per call. |
| `seed` | random | int64 | Pin for reproducibility. |
| `vaeModel` | *(default)* | AIR URN | Override the default VAE. Usually unnecessary. |
| `clipLModel` | *(default)* | AIR URN | Override the default CLIP-L. |
| `t5XXLModel` | *(default)* | AIR URN | Override the default T5-XXL text encoder. |

The default component URNs (Green-Sky's quantized GGUF releases on HuggingFace) are what the orchestrator falls back to when you omit `vaeModel` / `clipLModel` / `t5XXLModel`. They work out of the box — override only if you need a specific quantization or cached component.

### With LoRAs

LoRAs are a map of AIR URN → strength, identical shape to Comfy:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "flux1",
      "operation": "createImage",
      "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639",
      "prompt": "A detailed anime character in a magical forest, ethereal lighting",
      "width": 1024,
      "height": 1024,
      "loras": {
        "urn:air:flux1:lora:civitai:123456@789012": 0.8
      }
    }
  }]
}
```

### Image-to-image (`createVariant`)

Pass a source image and a new prompt; the model re-imagines it. `strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. `0.6`–`0.8` is the "keep composition, change style" sweet spot.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "flux1",
      "operation": "createVariant",
      "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639",
      "prompt": "Make it daytime with clear blue sky",
      "width": 1024,
      "height": 1024,
      "image": "https://image.civitai.com/.../source.jpeg",
      "strength": 0.7
    }
  }]
}
```

Note `image` is a plain string URL (not a `{ url: ... }` wrapper), and the field is `strength` (not `denoiseStrength` like on Comfy).

### Edit image (`editImage`)

Alternative to `createVariant` — accepts up to two reference images and treats the prompt as an edit instruction rather than a variant direction:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "flux1",
      "operation": "editImage",
      "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639",
      "prompt": "Make it a winter scene with snow falling",
      "width": 1024,
      "height": 1024,
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

`images[]` takes up to 2 entries. Use `createVariant` when you want a strength-weighted re-imagining of a single source; use `editImage` when you want prompt-driven surgery (a more literal "do X to this picture" interpretation).

## Comfy (ComfyUI-specific knobs)

When you need controls specific to ComfyUI's sampler surface — `ComfySampler` / `ComfyScheduler` enum values, a single-checkpoint AIR URN instead of separate components, or `denoiseStrength` semantics on img2img — use `engine: "comfy"`. Otherwise prefer `sdcpp`.

### Text-to-image

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "comfy",
      "ecosystem": "flux1",
      "operation": "createImage",
      "model": "urn:air:flux1:checkpoint:civitai:618692@691639",
      "prompt": "A photorealistic portrait of a woman in a cyberpunk city, neon reflections",
      "width": 1024,
      "height": 1024,
      "steps": 20,
      "cfgScale": 3.5,
      "sampler": "euler",
      "scheduler": "simple",
      "quantity": 1
    }
  }]
}
```

Key differences from sdcpp:

| Field | sdcpp | comfy |
|-------|-------|-------|
| Model spec | `diffuserModel` (+ optional components) | `model` — single checkpoint AIR URN |
| Sampler | `sampleMethod` ([`SdCppSampleMethod`](/orchestration/reference/)) | `sampler` ([`ComfySampler`](/orchestration/reference/)) |
| Schedule | `schedule` ([`SdCppSchedule`](/orchestration/reference/)) | `scheduler` ([`ComfyScheduler`](/orchestration/reference/)) |
| Img2img strength | `strength` (`createVariant`) | `denoiseStrength` (`createVariant`) |
| Max `quantity` | `4` | `12` |
| Max `width` / `height` | `1216` | `2048` |
| `negativePrompt` | ✅ | — |

Comfy also supports `createVariant` with the same shape, using a plain `image` string (URL, data URL, or Base64) and `denoiseStrength` instead of the plain `image` / `strength` pair sdcpp uses. See the [`ComfyFlux1VariantImageGenInput` schema](/orchestration/reference/) for the full field list.

## flux1-kontext (managed editing tier)

`flux1-kontext` stays inside the `flux1` ecosystem — same checkpoint family, same AIR prefix for any LoRAs/models you'd reference elsewhere in the `flux1` ecosystem — but routes inference to BFL's managed Kontext provider. Three model tiers (`dev`/`pro`/`max`), simpler input schema — just `prompt` + optional `images[]` + `aspectRatio`. No checkpoint selection, no LoRAs, no sampler knobs. The trade-off is convenience: BFL handles quality; you handle prompts and reference images.

### Text-to-image

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux1-kontext",
      "model": "pro",
      "prompt": "A photograph of a cat wearing a tiny astronaut helmet",
      "quantity": 1
    }
  }]
}
```

### Image editing (the Kontext strength)

Pass `images[]` to edit an existing image via prompt:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux1-kontext",
      "model": "max",
      "prompt": "Make it daytime",
      "quantity": 1,
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Kontext models:

| `model` | Notes |
|---------|-------|
| `dev` | Open-weights tier. Cheapest Kontext option. |
| `pro` | Commercial tier — BFL's standard production model. Default recommendation. |
| `max` | Top tier — highest quality, slowest, most expensive. Use for hero shots. |

Kontext-specific parameters:

| Field | Default | Notes |
|-------|---------|-------|
| `prompt` | — ✅ | ≤ 1000 chars. |
| `images[]` | — | URLs, data URLs, or Base64. When present → image-edit mode. Omit → text-to-image. |
| `aspectRatio` | `1:1` | Enum: `21:9`, `16:9`, `4:3`, `3:2`, `1:1`, `2:3`, `3:4`, `9:16`, `9:21`. |
| `guidanceScale` | `3.5` | `1`–`20`. |
| `quantity` | `1` | `1`–`4`. |
| `seed` | random | int64. |

## Reading the result

All Flux 1 paths emit the standard `imageGen` output — an `images[]` array, one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

| Path | Typical wall time per 1024×1024 image | `wait` recommendation |
|------|---------------------------------------|-----------------------|
| `sdcpp + flux1` | 10–30 s | `wait=60` usually fine |
| `comfy + flux1` | 10–30 s (LoRAs add a few seconds each) | `wait=60` usually fine |
| `flux1-kontext` (dev / pro) | 10–30 s depending on BFL queue | `wait=60` usually fine |
| `flux1-kontext` (max) | 15–60 s | `wait=60` sometimes, fall back to `wait=0` on busy periods |

`quantity > 2` or large dimensions push you toward the 100-second request timeout — submit with `wait=0` and poll instead.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

**sdcpp path** (`Flux1SdCppImageGenInput.CalculateCost`):

```
base  = 0.5 × steps × (editImages + 1) × (cfgScale == 1 ? 1 : 2)
total = base × quantity
```

| Shape | Buzz |
|-------|------|
| `createImage`, `steps: 28`, `cfgScale: 3.5`, `quantity: 1` | **~28** |
| `createImage`, `steps: 28`, `quantity: 4` | ~112 |
| `createVariant`, `quantity: 1` | ~28 |
| `editImage` with 1 reference | ~56 |

**Comfy path** (`ComfyFlux1ImageGenInput.CalculateCost`) — per-pixel + per-step scaling:

```
total = 8 × (width × height / 1024²) × (steps / 20) × quantity
```

At 1024² / `steps: 20` / `quantity: 1` → **~8 Buzz**. Comfy scales linearly with pixels and steps — 512² halves, 2048² quadruples, `steps: 40` doubles, and so on.

**Kontext** (`flux1-kontext`, BFL-hosted) — flat per-image by tier:

| Tier | Buzz per image |
|------|----------------|
| `dev` | **~35** |
| `pro` | **~45** |
| `max` | **~90** |

Multiply by `quantity`. No per-step / per-pixel scaling since Kontext doesn't expose those knobs.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with unknown property | Field not valid for this `engine` (e.g. `sampler` on sdcpp, `sampleMethod` on comfy, `loras` on `flux1-kontext`) | Match the schema for your chosen engine — see the tables above. |
| `400` with "diffuserModel is required" | sdcpp `createImage` / `createVariant` / `editImage` without a diffuser | Supply `diffuserModel` — the only required model component on sdcpp. VAE / CLIP-L / T5-XXL default automatically. |
| `400` with "model must match AIR pattern" | Passed a bare model ID or version slug | Use a full AIR URN: `urn:air:flux1:diffuser:civitai:<modelId>@<versionId>` (sdcpp) or `urn:air:flux1:checkpoint:civitai:<modelId>@<versionId>` (comfy). |
| `400` with "width/height out of range" on sdcpp | sdcpp clamps tighter than Comfy (`832`–`1216`, divisible by 16) | Round to a valid multiple of 16 inside that range, or switch to the Comfy engine for more freedom. |
| Output ignores the prompt on Flux.1 | `cfgScale` too low or prompt too short | Raise `cfgScale` toward 4; add lighting / composition / camera cues. |
| LoRA silently has no effect | Wrong AIR URN, unpublished / private model | Verify the URN on the LoRA's Civitai page; strengths outside `0.0`–`2.0` may also be clamped. |
| Kontext edit returns a generation unrelated to the source | `images[]` URL not reachable by BFL | Use a CDN-served URL (Civitai CDN works); see the source-URL notes in [Transcription → Choosing a source URL](./transcription). |
| Request timed out (`wait` expired) | Large `quantity`, Kontext `max` on a busy queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt or input image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Flux 2 image generation](./flux2) — newer Flux family with a cleaner schema, higher quality
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` (use `ecosystem: "flux1"` on the enhancer)
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling longer runs
* Full parameter catalog: the `Flux1SdCpp<Operation>Input`, `ComfyFlux1<Operation>Input`, `Flux1Kontext<Model>ImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes/training-flux1.md
---

# Flux 1 LoRA training

Train a Flux.1 LoRA on your own image dataset using AI Toolkit. The output LoRA is usable directly in [Flux 1 image generation](./flux1) (sdcpp or Comfy paths).

| `modelVariant` | Base model | Inference characteristics |
|----------------|-----------|---------------------------|
| `dev` (default) | `black-forest-labs/FLUX.1-dev` | Higher fidelity, ~20–28 sampler steps. Good default for most LoRAs. |
| `schnell` | `black-forest-labs/FLUX.1-schnell` | Faster inference, 4 sampler steps, no CFG. Use when you specifically want a Schnell-targeted LoRA. |

The base checkpoint is fixed by `modelVariant` — there's no `model` field to override. To train on a non-BFL Flux.1 finetune, use the [SDXL & SD1](./training-sdxl-sd1) or [other-image](./training-other-image) ecosystems instead.

::: tip Long-running step
Flux 1 training is the most expensive AI Toolkit ecosystem (200 Buzz/epoch) and runs for ~30s–2min per epoch on a typical 10-image dataset. Always use `wait=0` and follow up via polling or a webhook — see [Results & webhooks](/orchestration/guide/results-and-webhooks).
:::

## The request shape

```json
{
  "$type": "training",
  "input": {
    "engine":       "ai-toolkit",
    "ecosystem":    "flux1",
    "modelVariant": "dev"        // dev | schnell
  }
}
```

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A training-data zip uploaded to a reachable URL (signed R2 URL, Civitai R2 AIR, or any HTTPS URL)
* An accurate `count` of images in the zip

## Flux 1 dev (default)

Trains on top of `FLUX.1-dev` and produces a LoRA usable with any Flux 1 dev workflow.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "flux1",
      "modelVariant": "dev",
      "epochs": 5,
      "resolution": 1024,
      "lr": 0.0001,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 16,
      "networkAlpha": 16,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2657604TrainingData.EYBd.zip",
        "count": 10
      },
      "samples": {
        "prompts": ["a photo of TOK", "TOK in a garden", "TOK portrait"]
      }
    }
  }]
}
```

## Flux 1 schnell

Trains on top of `FLUX.1-schnell`. Inference uses 4 steps and `cfgScale: 0` — the output LoRA is meant to be used in those conditions.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "flux1",
      "modelVariant": "schnell",
      "epochs": 5,
      "lr": 0.0001,
      "trainTextEncoder": false,
      "networkDim": 16,
      "networkAlpha": 16,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2657604TrainingData.EYBd.zip",
        "count": 10
      },
      "samples": { "prompts": ["a photo of TOK", "TOK in a garden"] }
    }
  }]
}
```

## Common parameters {#common-parameters}

Shared by both Flux 1 variants. Defaults shown are after `ApplyDefaults`.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `engine` | ✅ | — | Always `ai-toolkit`. |
| `ecosystem` | ✅ | — | Always `flux1` for this page. |
| `modelVariant` | ✅ | — | `dev` or `schnell`. Determines the base checkpoint. |
| `epochs` | | `5` | `1`–`20`. Billed per epoch. |
| `numberOfRepeats` | | auto: `ceil(200 / count)` | `1`–`5000`. |
| `lr` | | `0.0001` | UNet learning rate. Flux 1 is sensitive to high LRs — keep ≤ `0.0005`. |
| `trainTextEncoder` | | `false` | Flux 1 does not benefit much from text-encoder training. Leave off. |
| `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. |
| `optimizerType` | | `adamw8bit` | `adamw`, `adamw8bit`, `adam8bit`, `lion`, `lion8bit`, `adafactor`, `adagrad`, `prodigy`, `prodigy8bit`, `automagic`. |
| `networkDim` | | `16` | `1`–`256`. Flux 1's lower default reflects how compactly Flux LoRAs encode style/character vs. SD-family. |
| `networkAlpha` | | matches `networkDim` | `1`–`256`. |
| `noiseOffset` | | `0` | `0`–`1`. |
| `flipAugmentation` | | `false` | Random horizontal flips. |
| `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. |
| `triggerWord` | | *(none)* | Activation token. Recommended for character / style LoRAs. |
| `trainingData.{type, sourceUrl, count}` | ✅ | — | Always `type: "zip"`. |
| `samples.prompts[]` | | `[]` | Preview prompts rendered after each epoch using the trained LoRA at strength 1.0. |
| `samples.negativePrompt` | | *(none)* | — |

## Reading the result

Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result) for the full shape. The relevant bit:

```json
{
  "output": {
    "moderationStatus": "Approved",
    "epochs": [
      {
        "epochNumber": 1,
        "model": { "id": "blob_...", "url": "https://.../epoch_1.safetensors" },
        "samples": [{ "id": "blob_...", "url": "https://.../sample_0.jpeg" }]
      }
    ]
  }
}
```

The `model` blob is your trained LoRA — download it (URLs are signed and expire), or use the blob URL directly with [Flux 1 image generation](./flux1) by referencing its AIR in the `loras` field.

## Runtime

Per-epoch wall time on a 10-image dataset, default settings:

| Variant | Per-epoch | 5-epoch full run |
|---------|-----------|-------------------|
| `dev` | ~60–120 s | 5–15 min |
| `schnell` | ~60–120 s | 5–15 min |

Always use `wait=0`.

## Cost

```
total = 200 × epochs   (Buzz)
```

| Configuration | Buzz |
|---------------|------|
| `epochs: 5` | 1000 + samples |
| `epochs: 10` | 2000 + samples |
| `epochs: 20` (max) | 4000 + samples |

Sample-prompt rendering is billed separately at the appropriate Flux 1 generation rate. Run with `whatif=true` (the **Preview cost** button on the widgets above) to see the exact pre-flight charge.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "modelVariant required" | Missing `modelVariant` field | Set to `"dev"` or `"schnell"`. |
| `400` with "epochs out of range" | `epochs` outside `1`–`20` | Cap at 20. |
| `400` with "trainingData.sourceUrl not reachable" | Signed URL expired | Regenerate. Prefer Civitai R2 AIRs over signed URLs for long-lived references. |
| Trained LoRA underbaked | Too few epochs for dataset, or `lr` too low | Raise `epochs` to 8–12 for character LoRAs; keep `lr` at `0.0001`–`0.0003`. |
| Trained LoRA overfits | Too many epochs / too high `networkDim` | Lower `epochs`, drop `networkDim` to 8–12. |
| Step `failed`, output `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged images. |

## Related

* [SDXL & SD1 LoRA training](./training-sdxl-sd1) — cheaper, classic SD ecosystems
* [Flux 2 Klein LoRA training](./training-flux2-klein) — current Flux generation, including image-edit training
* [Flux 1 image generation](./flux1) — use a trained LoRA via `loras: { "<lora-air>": 1.0 }`
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running training jobs
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow)
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml)

---

---
url: /orchestration/recipes/flux2.md
---

# Flux 2 image generation

Flux 2 is Black Forest Labs' latest image-generation family. The orchestrator exposes every shipped variant under the `imageGen` step, selected by the `model` field:

| `model` | Best for | Notes |
|---------|----------|-------|
| `klein` | **Default** — cheapest and most capable variant for almost every workload | Supports `createImage` / `createVariant` / `editImage`. Two size tiers (`4b` / `9b`). Takes LoRAs. Runs on Civitai infra. |
| `dev` | Higher fidelity when Klein isn't enough, with LoRA support | Supports `createImage` / `editImage`. Exposes `guidanceScale` + `numInferenceSteps`. |
| `flex` | Mid-tier quality, faster than `dev` | Supports `createImage` / `editImage`. Fewer tunable knobs. |
| `pro` | Commercial tier — routed through BFL's provider | Supports `createImage` / `editImage`. No LoRAs. |
| `max` | Top commercial tier — premium hero shots | Supports `createImage` / `editImage`. Slowest + most expensive. |

**Default choice for new integrations**: `model: "klein"`, `modelVersion: "4b"`. Upgrade to `9b` when you want more fidelity on the same variant, step to `dev` for open-weights Flux 2 with the official sampler, or `pro` / `max` for BFL-managed commercial output.

## The request shape

Every Flux 2 request is a single `imageGen` step with three keys selecting the variant and operation:

```json
{
  "$type": "imageGen",
  "input": {
    "engine":    "flux2",
    "model":     "klein",       // klein | dev | flex | pro | max
    "operation": "createImage"  // createImage | editImage
  }
}
```

The orchestrator dispatches to the matching input schema (`Flux2KleinCreateImageInput`, `Flux2DevEditImageInput`, …), so only the fields valid for that combination are accepted — [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) will `400` on unknown ones.

::: tip `createVariant` on Klein
The native `flux2` engine exposes `createImage` and `editImage` on every model. If you want strength-weighted img2img (`createVariant`), **Klein** and **Dev** each have a second invocation path via `engine: "sdcpp"` + `ecosystem: "flux2Klein"` / `"flux2Dev"` — same models, extra operations. See [Klein → createVariant](#klein-createvariant-img2img) and [Dev createVariant](#dev-createvariant-img2img-via-sdcpp) below.
:::

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For `editImage` / `createVariant` operations: one or more source image URLs, data URLs, or Base64 strings

## Klein (default)

Klein is the cost/capability sweet spot for almost every Flux 2 workload. Cheap enough to generate at scale, capable enough for production output, and the only variant that supports `createVariant`. Two size tiers:

| `modelVersion` | Typical use |
|----------------|-------------|
| `4b` (default) | Fastest, cheapest. Great default. |
| `4b-base` | Un-tuned 4b checkpoint — useful for custom fine-tuning, not for direct generation. |
| `9b` | Higher fidelity at higher cost. Step up from `4b` when quality matters more than throughput. |
| `9b-base` | Un-tuned 9b checkpoint, same caveats as `4b-base`. |
| `9b-kv` | 9b with key-value caching (ComfyUI worker only). Rare; use when a worker explicitly requires it. |

### Text-to-image (`createImage`)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux2",
      "model": "klein",
      "operation": "createImage",
      "modelVersion": "4b",
      "prompt": "A cozy cabin in the woods at sunset, cinematic lighting",
      "width": 1024,
      "height": 1024,
      "cfgScale": 5,
      "steps": 20
    }
  }]
}
```

Klein-specific parameters:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `modelVersion` | `4b` | `4b` / `4b-base` / `9b` / `9b-base` / `9b-kv` | Size tier. `4b` is the default workload pick. |
| `cfgScale` | `5` | `1`–`20` | Classifier-free guidance. `4`–`6` is the sweet spot on Klein. |
| `steps` | `20` | `4`–`50` | Sampler steps. Klein is efficient — 20 is usually plenty. |
| `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | `simple` | enum | [`SdCppSchedule`](/orchestration/reference/). |
| `negativePrompt` | *(none)* | string | Available on Klein — not exposed on `dev` / `flex` / `pro` / `max`. |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; strengths in `0.0`–`2.0` are typical. |

Plus the shared Flux 2 fields (`prompt`, `width`, `height`, `seed`, `quantity`, `outputFormat`, `enablePromptExpansion`) — see [Common parameters](#common-parameters).

### Bumping up to 9b

When `4b` isn't delivering enough fidelity, switch `modelVersion` to `9b` — same shape, same knobs, just a heavier model:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux2",
      "model": "klein",
      "operation": "createImage",
      "modelVersion": "9b",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "width": 1024,
      "height": 1536,
      "cfgScale": 5,
      "steps": 24
    }
  }]
}
```

### With LoRAs

Flux 2 Klein LoRAs are a map of AIR URN → strength (same shape as [Flux 1](./flux1)):

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux2",
      "model": "klein",
      "operation": "createImage",
      "modelVersion": "4b",
      "prompt": "A detailed anime character in a magical forest, ethereal lighting",
      "width": 1024,
      "height": 1024,
      "cfgScale": 5,
      "steps": 20,
      "loras": {
        "urn:air:flux2:lora:civitai:2169780@2443422": 1.0
      }
    }
  }]
}
```

Browse the [Civitai Flux 2 LoRA catalog](https://civitai.com/models?baseModels=Flux.2+D) for AIR URNs.

### Edit image (`editImage`)

Pass `images[]` (up to 2 entries) alongside a prompt treated as an edit instruction:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux2",
      "model": "klein",
      "operation": "editImage",
      "modelVersion": "4b",
      "prompt": "Make it a winter scene with snow falling",
      "width": 1024,
      "height": 1024,
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

### Klein createVariant (img2img) {#klein-createvariant-img2img}

The native `engine: "flux2"` path doesn't expose `createVariant` on Klein, but there's a second invocation path that does: `engine: "sdcpp"` + `ecosystem: "flux2Klein"`. Same model, same LoRAs, same size tiers — adds `createVariant` with `image` (single source) + `strength`:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "flux2Klein",
      "operation": "createVariant",
      "modelVersion": "4b",
      "prompt": "Make it daytime with clear blue sky",
      "width": 1024,
      "height": 1024,
      "image": "https://image.civitai.com/.../source.jpeg",
      "strength": 0.7
    }
  }]
}
```

`strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. `0.6`–`0.8` is the "keep composition, change style" sweet spot.

The sdcpp path also supports `createImage` and `editImage` on Klein with the same field shapes shown above under the native `flux2` engine — just swap `engine: "flux2", model: "klein"` for `engine: "sdcpp", ecosystem: "flux2Klein"`. Most users can stay on the native `flux2` engine; reach for the sdcpp path when you need `createVariant`.

## Dev — higher-fidelity open-weights

When Klein isn't delivering and you want open-weights quality, `dev` is the next step up. Supports LoRAs; exposes the native Flux 2 sampler interface (`guidanceScale`, `numInferenceSteps`):

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "flux2",
      "model": "dev",
      "operation": "createImage",
      "prompt": "A majestic cat sitting on a throne, highly detailed, 8k",
      "width": 1024,
      "height": 1024,
      "quantity": 1,
      "guidanceScale": 2.5,
      "numInferenceSteps": 28
    }
  }]
}
```

Dev-specific parameters:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `guidanceScale` | `2.5` | `0`–`20` | Lower = more creative, higher = sticks closer to the prompt. `2.5`–`4.0` is the sweet spot. |
| `numInferenceSteps` | `28` | `4`–`50` | Sampler steps. Diminishing returns past ~30. |
| `loras[]` | `[]` | array of `{ air, strength }` | **Note the shape difference from Klein**: Dev uses an *array* of `{ air, strength }` objects; Klein uses a *dict*. |

Dev also supports `operation: "editImage"` with `images[]` — same shape as Klein's edit, just on the richer sampler surface.

### Dev createVariant (img2img) via sdcpp

Like Klein, the native `engine: "flux2"` path doesn't expose `createVariant` on Dev — but there's a second invocation path that does: `engine: "sdcpp"` + `ecosystem: "flux2Dev"`. Same model, same LoRA support, with `image` (single source) + `strength` for img2img:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "flux2Dev",
      "operation": "createVariant",
      "prompt": "Make it daytime with clear blue sky",
      "width": 1024,
      "height": 1024,
      "image": "https://image.civitai.com/.../source.jpeg",
      "strength": 0.7
    }
  }]
}
```

`strength` runs `0.0`–`1.0` (default `0.7`). The sdcpp path also supports `createImage` and `editImage` on Dev — most users can stay on the native `flux2` engine; reach for the sdcpp path when you need `createVariant`.

## Flex — faster, lighter

Mid-tier quality, tuned for throughput. Same knobs as `dev`, slightly lower fidelity:

```json
{
  "$type": "imageGen",
  "input": {
    "engine": "flux2",
    "model": "flex",
    "operation": "createImage",
    "prompt": "A serene mountain landscape with a crystal clear lake at dawn",
    "width": 1024,
    "height": 1024,
    "guidanceScale": 3.5,
    "numInferenceSteps": 28
  }
}
```

Also supports `editImage`.

## Pro — BFL commercial tier

Routed through Black Forest Labs' production provider. No LoRAs, no sampler knobs — just prompt in, image out. Use when Klein / dev don't meet quality needs and you're willing to pay for BFL-managed output:

```json
{
  "$type": "imageGen",
  "input": {
    "engine": "flux2",
    "model": "pro",
    "operation": "createImage",
    "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
    "width": 1024,
    "height": 1536
  }
}
```

## Max — BFL flagship

Top commercial tier. Slowest and most expensive. Use for hero shots where quality matters more than throughput:

```json
{
  "$type": "imageGen",
  "input": {
    "engine": "flux2",
    "model": "max",
    "operation": "createImage",
    "prompt": "An epic fantasy battle scene with dragons, cinematic lighting, intricate details",
    "width": 1536,
    "height": 1024
  }
}
```

Same shape as `pro`; heavier backing model.

## Common parameters {#common-parameters}

These apply across all Flux 2 models (per the [`Flux2ImageGenInput` schema](/orchestration/reference/operations/InvokeImageGenStepTemplate)):

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `prompt` | ✅ | — | ≤ 1000 characters. Natural-language descriptions work best — include lighting, composition, camera/lens cues. |
| `width` | | `1024` | `512`–`2048`. Klein requires divisible by 16; other models have no divisibility constraint. |
| `height` | | `1024` | `512`–`2048`. Klein requires divisible by 16; other models have no divisibility constraint. |
| `quantity` | | `1` | `1`–`4`. Number of images returned per call. |
| `outputFormat` | | `jpeg` | `jpeg` or `png`. `png` for lossless, `jpeg` for smaller files. |
| `seed` | | random | `int64`. Pin for reproducibility. |
| `enablePromptExpansion` | | `false` | Model-side prompt expansion — Flux rewrites your prompt before generation. Off by default. |

For `editImage` operations, add `images[]` (up to 2 entries on Klein) — HTTP(S) URLs, data URLs, or Base64 strings.

## Reading the result

A successful `imageGen` step emits an `images[]` array — one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

Rough ranges on Civitai-hosted infra (warm node, queue permitting):

| Variant | Typical wall time per 1024×1024 image | `wait` recommendation |
|---------|---------------------------------------|-----------------------|
| `klein` (`4b`) | 5–15 s | `wait=60` fine for `quantity: 1` |
| `klein` (`9b`) | 10–25 s | `wait=60` usually fine |
| `dev`, `flex` | 10–30 s | `wait=60` usually works for `quantity ≤ 2` |
| `pro`, `max` | 15–60 s depending on BFL queue | `wait=60` works sometimes; fall back to `wait=0` + polling on busy periods |

Past ~2 images, large dimensions, or `pro`/`max` on a busy queue, you risk hitting the 100 s request timeout — submit with `wait=0` and poll / webhook.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` to preview the exact charge before submitting; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

**Klein** (`Flux2KleinSdCppImageGenInput.CalculateCost`) — driven by `modelVersion`, `steps`, and `cfgScale`:

```
base     = stepCost × steps × (editImages + 1) × (cfgScale == 1 ? 1 : 2)
stepCost = 0.3 (4b / 4b-base), 0.5 (9b / 9b-base)
total    = base × quantity
```

| Variant | Shape | Buzz |
|---------|-------|------|
| Klein `4b`, `createImage`, `steps: 20`, `cfgScale: 5`, `quantity: 1` | default | **~12** |
| Klein `4b`, `createImage`, `quantity: 4` | batch | ~48 |
| Klein `4b`, `editImage` with 1 reference | edit | ~24 |
| Klein `9b`, `createImage`, `steps: 24`, `cfgScale: 5` | upgrade | **~24** |

**Dev / Flex / Pro / Max** use a per-megapixel formula — `ceil(width × height / 1 000 000) × costPerMegapixel × quantity`, where `costPerMegapixel` doubles when LoRAs are present on `dev`. A default 1024² `dev` createImage lands at **~40 Buzz**; expect commercial-tier variants (`pro`, `max`) to be materially higher. Run `whatif=true` when pricing matters.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "prompt must be less than 1000" | Too long | Trim; 500 chars is plenty for most prompts. |
| `400` with "width/height out of range" | Outside `512`–`2048`, or not divisible by 8 (16 on Klein) | Round to a valid multiple. |
| `400` with unexpected property | Field not valid for this `model`/`operation` (e.g. `loras` on `pro`, `guidanceScale` on `klein`, `cfgScale` on `dev`) | Match the schema for your chosen variant — see the tables above. Klein uses `cfgScale`/`steps`/`sampleMethod`; dev/flex use `guidanceScale`/`numInferenceSteps`. |
| `400` with "createVariant is not a valid operation" on Klein / Dev (native `flux2` engine) | Native `flux2` engine only exposes `createImage` + `editImage` | Use `engine: "sdcpp"` + `ecosystem: "flux2Klein"` or `"flux2Dev"` to access `createVariant`. See [Klein createVariant](#klein-createvariant-img2img) or [Dev createVariant](#dev-createvariant-img2img-via-sdcpp). |
| `400` with "LoRA not found" | AIR URN wrong or model private / not published | Verify the URN on the model's Civitai page. |
| Output ignores the prompt | `enablePromptExpansion: true` with a short prompt; or guidance too low | Set `enablePromptExpansion: false` and/or raise `cfgScale` (Klein) / `guidanceScale` (dev, flex). |
| Request timed out (`wait` expired) | Large `quantity`, `max`/`pro` on a busy queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt or input image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Flux 1 image generation](./flux1) — classic Flux.1 family (sdcpp, Comfy, Kontext editing)
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling longer runs
* Full parameter catalog: the `Flux2<Model><Operation>Input` and `Flux2KleinSdCpp<Operation>Input` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes/training-flux2-klein.md
---

# Flux 2 Klein LoRA training

Train a Flux 2 Klein LoRA for use with the [Flux 2 image generation](./flux2) recipe. Two size tiers, plus a special **edit-training** mode for image-editing LoRAs that take control / reference images at inference time.

| `modelVariant` | Base | Buzz / epoch | Use when |
|----------------|------|--------------|----------|
| `4b` (default) | `FLUX.2-klein-base-4B` | 50 | Cheaper / faster training. Pairs with Klein `4b` inference. |
| `9b` | `FLUX.2-klein-base-9B` | 100 | Higher fidelity. Pairs with Klein `9b` inference. |

The base checkpoint is fixed by `modelVariant`; there is no `model` field on the input. Set `isEditTraining: true` to train an editing LoRA — the dataset zip layout changes (see [Edit training](#edit-training)).

::: tip Long-running step
Always submit with `wait=0`. Klein training takes ~10–60s per epoch on a 10-image dataset; full multi-epoch runs land in single-digit minutes for `4b`, longer for `9b`.
:::

## The request shape

```json
{
  "$type": "training",
  "input": {
    "engine":         "ai-toolkit",
    "ecosystem":      "flux2klein",
    "modelVariant":   "4b",         // 4b | 9b
    "isEditTraining": false         // optional, defaults to false
  }
}
```

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A training-data zip:
  * For standard training: a flat zip of training images
  * For [edit training](#edit-training): a zip with `main/`, `control_1/`, `control_2/`, `control_3/` subfolders

## Klein 4b (default)

Fastest and cheapest tier. Default for most LoRAs.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "flux2klein",
      "modelVariant": "4b",
      "epochs": 1,
      "lr": 0.0005,
      "trainTextEncoder": false,
      "lrScheduler": "constant",
      "optimizerType": "adamw8bit",
      "networkDim": 2,
      "networkAlpha": 1,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2658016TrainingData.1zGG.zip",
        "count": 15
      },
      "samples": {
        "prompts": [
          "fruit, food, no humans, blue eyes, solo, leaf, strawberry, fangs, pokemon (creature)",
          "no humans, pokemon (creature), cup, food, solo, bird, blush, blurry, animal focus",
          "no humans, candle, pokemon (creature), blurry, animal focus, solo, food, bird, standing"
        ]
      }
    }
  }]
}
```

## Klein 9b

Same shape, larger base model. Recommended `epochs: 5`+, `networkDim: 32`, `lr: ~1e-4`.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "flux2klein",
      "modelVariant": "9b",
      "epochs": 5,
      "resolution": 1024,
      "lr": 0.000102,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2657604TrainingData.EYBd.zip",
        "count": 1
      },
      "samples": { "prompts": [] }
    }
  }]
}
```

## Edit training {#edit-training}

Setting `isEditTraining: true` produces an **editing LoRA** — at inference time it takes one or more reference images alongside the prompt and modifies them. The dataset zip layout differs:

* `main/` — target images (what the LoRA should produce)
* `control_1/`, `control_2/`, `control_3/` — reference / source images that pair with each `main/` entry

Filenames inside the subfolders must align across folders. Reading the result you'll get a LoRA that works with [Flux 2 Klein → editImage](./flux2#edit-image-editimage).

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training", "edit"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "flux2klein",
      "modelVariant": "4b",
      "isEditTraining": true,
      "epochs": 3,
      "lr": 0.0001,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "https://blobs-temp.sfo3.digitaloceanspaces.com/flux2_klein_edit_testdata.zip",
        "count": 3
      },
      "samples": {
        "prompts": [
          "a portrait of a woman standing in a sunlit garden with flowers",
          "a landscape painting of rolling hills at sunset",
          "a painting of a cat sitting on a windowsill looking outside at a rainy day"
        ],
        "sourceImages": [
          "https://blobs-temp.sfo3.digitaloceanspaces.com/sample-edit-source-1.jpg",
          "https://blobs-temp.sfo3.digitaloceanspaces.com/sample-edit-source-2.jpg"
        ]
      }
    }
  }]
}
```

`samples.sourceImages` is required for edit training when you want preview samples — the listed URLs become the reference images for the per-epoch sample renders.

## Common parameters {#common-parameters}

Defaults shown are the post-`ApplyDefaults` values for Klein.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `engine` | ✅ | — | Always `ai-toolkit`. |
| `ecosystem` | ✅ | — | Always `flux2klein` for this page. |
| `modelVariant` | ✅ | — | `4b` or `9b`. |
| `isEditTraining` | | `false` | When `true`, dataset zip must contain `main/` + `control_*/` subfolders. |
| `epochs` | | `5` | `1`–`20`. Billed per epoch. |
| `numberOfRepeats` | | auto: `ceil(200 / count)` | `1`–`5000`. |
| `lr` | | `0.0001` | Klein is sensitive to high LRs — keep in `1e-4`–`5e-4`. |
| `trainTextEncoder` | | `false` | Klein uses Qwen-3 as its text encoder; AI Toolkit does not train it. Leave `false`. |
| `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. |
| `optimizerType` | | `adamw8bit` | See SDXL/SD1 page for full enum. |
| `networkDim` | | `32` | `1`–`256`. Klein LoRAs are typically `16`–`32`. |
| `networkAlpha` | | matches `networkDim` | `1`–`256`. |
| `noiseOffset` | | `0` | `0`–`1`. |
| `flipAugmentation` | | `false` | Random horizontal flips. |
| `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. |
| `triggerWord` | | *(none)* | Activation token. |
| `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. For edit training, `count` should equal the number of `main/` entries. |
| `samples.prompts[]` | | `[]` | Per-epoch preview prompts. |
| `samples.negativePrompt` | | *(none)* | — |
| `samples.sourceImages[]` | | `[]` | Edit-training only — reference images for sample renders. |

## Reading the result

Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a Klein LoRA `.safetensors` blob plus any sample images.

To use the trained LoRA, register it on Civitai (or reference its blob URN directly) and pass it under `loras` in a [Flux 2 Klein generation](./flux2#klein-default) request:

```json
{
  "$type": "imageGen",
  "input": {
    "engine": "flux2",
    "model": "klein",
    "operation": "createImage",
    "modelVersion": "4b",
    "prompt": "your prompt",
    "loras": { "urn:air:flux2:lora:civitai:<id>@<version>": 1.0 }
  }
}
```

## Runtime

Per-epoch wall time, default settings on a 10-image dataset:

| Variant | Per-epoch | Typical full run |
|---------|-----------|-------------------|
| `4b` | ~10–30 s | 1–5 min for 5 epochs |
| `9b` | ~30–90 s | 5–15 min for 5 epochs |
| `4b` + `isEditTraining` | ~20–45 s | 2–8 min for 5 epochs (more steps per epoch) |

Always use `wait=0`.

## Cost

```
total = costPerEpoch × epochs
costPerEpoch = 50 (4b), 100 (9b)
```

| Configuration | Buzz |
|---------------|------|
| Klein `4b`, `epochs: 5` | 250 + samples |
| Klein `4b`, `epochs: 1` | 50 + samples |
| Klein `9b`, `epochs: 5` | 500 + samples |
| Klein `9b`, `epochs: 10` | 1000 + samples |

Sample-prompt rendering is billed separately at Klein image-generation rates (~8 Buzz per sample for `4b`, ~16 for `9b`). Run with `whatif=true` to see exact charges.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "modelVariant required" | Missing `modelVariant` | Set to `"4b"` or `"9b"`. |
| `400` with "isEditTraining: true requires control folders" | Edit-training zip missing `control_*/` subfolders | Repackage the zip with `main/`, `control_1/`, `control_2/`, `control_3/`. Filenames must align across folders. |
| Step `failed` mentioning "training data validation" | Edit-training zip filenames don't match across `main/` and `control_*/` | Ensure the same basenames appear in `main/` and at least one `control_*/` folder. |
| Trained LoRA underbaked | Too few epochs / too low `lr` | Raise `epochs` to 5–10; keep `lr` ≤ `5e-4`. |
| Trained LoRA overcooked / broken samples | `lr` too high | Drop `lr` to `1e-4`–`2e-4`. |
| Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged images. |

## Related

* [Flux 1 LoRA training](./training-flux1) — open-weights Flux LoRAs (Dev / Schnell)
* [SDXL & SD1 LoRA training](./training-sdxl-sd1) — cheaper SD-family ecosystems
* [Flux 2 image generation](./flux2) — use a trained Klein LoRA via `loras: { ... }` on `model: "klein"`
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running training jobs
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow)
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml)

---

---
url: /orchestration/recipes/gemini.md
---

# Gemini image generation

The `gemini` engine exposes **Gemini 2.5 Flash Image** — the same underlying model product as Google's [`nano-banana-*`](./google) variants, but via the direct Gemini API rather than Vertex AI. Simpler input shape: no aspect-ratio or resolution picker, just prompt (+ optional reference images for edits) and a `quantity`. Uses `operation` as the discriminator, mirroring most other imageGen engines.

::: tip Gemini vs Google
If you want explicit aspect-ratio control, resolution tiers, or web-search grounding, use the [`google` engine](./google) with `model: "nano-banana-2"` — same product, richer controls. Pick `gemini` when you want the minimal shape and the direct Gemini-API semantics.
:::

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For `editImage`: 1–4 source images (URLs, data URLs, or Base64)

## Text-to-image (`createImage`)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "gemini",
      "model": "2.5-flash",
      "operation": "createImage",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "quantity": 1
    }
  }]
}
```

## Image editing (`editImage`)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "gemini",
      "model": "2.5-flash",
      "operation": "editImage",
      "prompt": "Make it a winter scene with snow falling",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Pass 1–4 reference images in `images[]` — the prompt is treated as an edit instruction applied across them.

## Parameters

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `model` | ✅ | — | Only `2.5-flash` exposed today. |
| `operation` | ✅ | — | `createImage` or `editImage`. |
| `prompt` | ✅ | — | Natural-language. No explicit cap documented; keep it reasonable. |
| `quantity` | | `1` | `1`–`4`. |
| `images[]` | ✅ on `editImage` | — | 1–4 entries. URLs, data URLs, or Base64. |

No aspect-ratio control, no resolution tier, no seed, no safety toggle. If you need any of those, switch to the [`google` engine](./google) with `nano-banana-2`.

## Reading the result

Standard `imageGen` output — an `images[]` array, one per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.png" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

Typical wall time 8–20 s per image including queue. `wait=60` is comfortable for `quantity: 1`–`2`; larger batches or busy periods warrant `wait=0` + polling.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Flat per-image:

```
total = 60 × quantity
```

| Shape | Buzz |
|-------|------|
| `createImage`, `quantity: 1` | **~60** |
| `editImage` with 1 reference | ~60 |
| `createImage`, `quantity: 4` | ~240 |

Gemini 2.5 Flash's price doesn't depend on resolution (there's no `resolution` field) or the number of reference images. If you need the same product with tiered resolution pricing, the [`google` engine](./google)'s `nano-banana-2` is materially cheaper at 1K (~104 Buzz) and has a tiered scale for 2K / 4K.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with unknown property `aspectRatio` / `resolution` | Those fields live on the `google` engine, not `gemini` | Switch engines, or drop the field. |
| `400` with "images minItems" on `editImage` | Empty `images[]` | Include at least one source image when `operation: "editImage"`. |
| `400` with "images maxItems" | More than 4 source images | Trim to 4 — `google/nano-banana-2` accepts up to 10 if you need more. |
| Output doesn't look edited | Prompt described target state rather than the change | Phrase as an instruction (`"Make it a winter scene"`) rather than a description of the result. |
| Request timed out (`wait` expired) | Busy Gemini API queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Google's content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Google image generation](./google) — Nano Banana / Imagen 4 via Vertex AI (alternate routing with richer controls)
* [OpenAI image generation](./openai) — alternative commercial tier
* [Flux 2](./flux2) / [Flux 1](./flux1) / [Qwen](./qwen) — open-weights alternatives on Civitai-hosted workers
* Full parameter catalog: the `Gemini25FlashCreateImageGenInput` and `Gemini25FlashEditImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface

---

---
url: /site/guide/getting-started.md
description: Create an API token and make your first request to the Civitai site API.
---

# Getting started

## 1. Generate an API token

API tokens are managed from your Civitai account page:

1. Sign in to [civitai.com](https://civitai.com).
2. Open [Account settings](https://civitai.com/user/account).
3. Scroll to the **API Keys** section and click **Add API key**.
4. Give the key a name and copy the generated token — it's shown only once.

Store the token somewhere safe. Treat it like a password.

## 2. Make your first request

Most site API endpoints are public — you can call them without a token:

```bash
curl "https://civitai.com/api/v1/models?limit=1"
```

Try it right here:

Response (truncated):

```json
{
  "items": [
    {
      "id": 827184,
      "name": "WAI-illustrious-SDXL",
      "type": "Checkpoint",
      "creator": { "username": "WAI0731" },
      "modelVersions": [ /* ... */ ]
    }
  ],
  "metadata": {
    "nextCursor": "75363|932023|257749",
    "nextPage": "https://civitai.com/api/v1/models?limit=1&cursor=..."
  }
}
```

## 3. Make an authenticated request

Some endpoints require a token — for example `/me`, which identifies the caller:

```bash
export CIVITAI_TOKEN="your-token-here"

curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/me"
```

```json
{
  "id": 12345,
  "username": "you",
  "tier": "founder",
  "status": "active",
  "isMember": true,
  "subscriptions": ["monthly"]
}
```

Set a token via the **Token** button in the navbar, then try it here:

A few endpoints (`GET /models` with the `favorites` or `hidden` flag, for
example) also require authentication even though the base endpoint is public.
See [Authentication](./authentication) for the full list.

## Next steps

* [Authentication](./authentication) — token formats, query-param fallback, 401 behavior.
* [Pagination](./pagination) — walking through large result sets.
* [AIR identifiers](./air) — the URN format used throughout Civitai (and the Orchestration API).
* [Reference](../reference/) — parameters and response fields for every endpoint.

---

---
url: /orchestration/recipes/google.md
---

# Google image generation

Routes to Google's image-generation APIs (Vertex AI / Gemini API). Three models, selected via the `model` field:

| `model` | Also known as | Notes |
|---------|---------------|-------|
| `nano-banana-2` | Gemini 2.5 Flash Image, next-gen | **Default** — text-to-image + image-editing via `images[]`, high-resolution tier (up to 4K), optional web/Google search grounding. |
| `nano-banana-pro` | Gemini 2.5 Pro Image | Same shape as `nano-banana-2` for most purposes; pro tier for premium output. |
| `imagen4` | Imagen 4 | Google's dedicated image model (not Gemini-based). Natural-language + negative prompt, fewer aspect ratios, 1K only. |

**Default choice for new integrations**: `model: "nano-banana-2"`. It's fast, capable, supports editing via `images[]`, and has the widest aspect-ratio and resolution range. Step up to `nano-banana-pro` for hero-shot quality; reach for `imagen4` when you specifically want Google's older Imagen family semantics (negative prompts, stricter aspect-ratio set).

::: tip Gemini vs Google
The `gemini` engine ([Gemini image generation](./gemini)) exposes the same Gemini 2.5 Flash Image product as `model: "2.5-flash"` via the direct Gemini API, with a slightly different input shape. Pick based on which API semantics you prefer — this page covers the `google` engine.
:::

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For image editing: one or more source image URLs, data URLs, or Base64 strings (Nano Banana only — Imagen 4 is create-only)

## nano-banana-2 (default — Gemini 2.5 Flash Image)

### Text-to-image

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "google",
      "model": "nano-banana-2",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "aspectRatio": "1:1",
      "resolution": "1K",
      "numImages": 1
    }
  }]
}
```

### Parameters

| Field | Default | Allowed | Notes |
|-------|---------|---------|-------|
| `prompt` | — ✅ | ≤ 50 000 chars | Natural-language, very long prompts permitted. |
| `aspectRatio` | `1:1` | `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16` | |
| `resolution` | `1K` | `1K` / `2K` / `4K` | Multi-resolution tier — `4K` is slower and more expensive. |
| `numImages` | `1` | `1`–`4` | Nano Banana uses `numImages`, not `quantity`. |
| `images[]` | *(none)* | max 10 | Passing `images[]` switches to edit mode. URLs, data URLs, or Base64. |
| `seed` | random | int32 | Pin for reproducibility. |
| `enableWebSearch` | `false` | boolean | Let the model ground its output in fresh web-search results. |
| `enableGoogleSearch` | `false` | boolean | Let the model ground its output in Google Search results — useful for accurate depictions of real places/people/events. |

### Image editing

Drop one or more source images into `images[]` and the same endpoint switches to edit mode — no separate `operation` field:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "google",
      "model": "nano-banana-2",
      "prompt": "Make it a winter scene with snow falling",
      "aspectRatio": "1:1",
      "resolution": "1K",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Up to 10 reference images per call. Useful for prompt-driven edits and compositional blends.

### With web-search grounding

`enableWebSearch` / `enableGoogleSearch` let the model pull fresh factual context into its generation. Handy for depicting real locations, current events, or brands accurately:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "google",
      "model": "nano-banana-2",
      "prompt": "A realistic photo of the Eiffel Tower at night, with accurate lighting and modern signage",
      "aspectRatio": "16:9",
      "resolution": "2K",
      "enableWebSearch": true
    }
  }]
}
```

## nano-banana-pro

Pro-tier version of Nano Banana. Identical input shape minus the search-grounding toggles and `seed`. Reach for it when you want premium output quality on the same API:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "google",
      "model": "nano-banana-pro",
      "prompt": "A cinematic scene of a dragon perched on a mountain peak at dawn",
      "aspectRatio": "21:9",
      "resolution": "2K",
      "numImages": 1
    }
  }]
}
```

Same aspect-ratio / resolution enums, same `images[]` editing behaviour (up to 10 inputs). Most costly of the three Google models — use for hero shots, not bulk generation.

## imagen4

Google's dedicated Imagen 4 model. Different semantics from Nano Banana — supports `negativePrompt`, stricter aspect-ratio enum, no resolution tiers (implicit 1K), no editing:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "google",
      "model": "imagen4",
      "prompt": "A majestic fantasy landscape with floating islands, cinematic lighting",
      "negativePrompt": "blurry, low quality",
      "aspectRatio": "16:9",
      "numImages": 1
    }
  }]
}
```

| Field | Default | Allowed | Notes |
|-------|---------|---------|-------|
| `prompt` | — ✅ | ≤ 1 000 chars | Tighter than Nano Banana's 50k. |
| `negativePrompt` | `""` | ≤ 1 000 chars | Imagen-specific — Nano Banana doesn't accept one. |
| `aspectRatio` | `1:1` | `1:1`, `16:9`, `9:16`, `3:4`, `4:3` | Smaller set than Nano Banana. |
| `numImages` | `1` | `1`–`4` | |
| `seed` | random | int64 | |

No editing. No `resolution` picker — outputs are always 1K.

## Reading the result

All Google models emit the standard `imageGen` output:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.png" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

Google's API queue is the dominant factor. Typical wall times:

| Model / resolution | Per-image wall time | `wait` recommendation |
|--------------------|---------------------|-----------------------|
| `imagen4` (1K) | 8–20 s | `wait=60` fine |
| `nano-banana-2` (1K) | 8–20 s | `wait=60` fine |
| `nano-banana-2` (2K / 4K) | 20–60 s | `wait=60` sometimes; fall back to `wait=0` |
| `nano-banana-pro` (any) | 20–60 s depending on queue | `wait=60` sometimes; `wait=0` + polling is safer |

Enable `wait=0` + polling for batches above `numImages: 2`, 4K output, or Pro tier.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Flat per-image pricing by model, `resolution`, and grounding toggles:

```
total = base × numImages
```

| Model | Base (per image) | Notes |
|-------|------------------|-------|
| `imagen4` | **40** | Fixed; aspect ratio doesn't affect price. |
| `nano-banana-2` (1K) | **104** | Default resolution tier. |
| `nano-banana-2` (2K) | **156** | |
| `nano-banana-2` (4K) | **208** | |
| `nano-banana-pro` (1K, text-only) | **160** | |
| `nano-banana-pro` (1K, with `images[]`) | **180** | Image-to-image carries a small premium. |
| `nano-banana-pro` (2K, text-only) | **230** | |
| `nano-banana-pro` (2K, with `images[]`) | **250** | |
| `nano-banana-pro` (4K, text-only) | **320** | |
| `nano-banana-pro` (4K, with `images[]`) | **340** | |

**Web-search grounding** (Nano Banana 2 only) adds **+20 Buzz per image** for each flag enabled — `enableWebSearch: true` and `enableGoogleSearch: true` stack (so +40 if both on).

Examples:

* `imagen4`, `numImages: 1` → **~40 Buzz**
* `nano-banana-2` 1K, `numImages: 1` → **~104 Buzz**
* `nano-banana-2` 1K + web search, `numImages: 1` → **~124 Buzz**
* `nano-banana-2` 4K, `numImages: 4` → ~832 Buzz
* `nano-banana-pro` 2K text-only, `numImages: 1` → **230 Buzz**; with `images[]` → **250 Buzz**

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with unknown property `quantity` | Sent `quantity` instead of `numImages` | Google uses `numImages`; OpenAI / Flux use `quantity`. Easy to mix up. |
| `400` with "aspectRatio must be one of" on Imagen 4 | Passed a Nano Banana–only ratio like `21:9` or `5:4` | Imagen 4's set is smaller — stick to `1:1`, `16:9`, `9:16`, `3:4`, `4:3`. |
| `400` with "resolution is not a valid property" on Imagen 4 | Imagen 4 has no `resolution` field | Drop it — Imagen 4 is always 1K. |
| `400` with "images is not a valid property" on Imagen 4 | Imagen 4 is create-only | Switch to `nano-banana-2` or `nano-banana-pro` for editing. |
| `400` with "images maxItems" | More than 10 reference images on Nano Banana | Trim to 10. |
| Output seems disconnected from reality (wrong year of events, nonexistent place) | No search grounding | Set `enableWebSearch: true` (or `enableGoogleSearch: true`) on `nano-banana-2`. |
| Request timed out (`wait` expired) | Large `numImages`, 4K resolution, or Pro tier on busy queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Google's content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Gemini image generation](./gemini) — Gemini 2.5 Flash Image via the direct Gemini API (alternate routing to Nano Banana)
* [OpenAI image generation](./openai) — alternative commercial tier
* [Flux 2](./flux2) / [Flux 1](./flux1) / [Qwen](./qwen) — open-weights alternatives on Civitai-hosted workers
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `Imagen4ImageGenInput`, `NanoBananaProImageGenInput`, `NanoBanana2ImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface

---

---
url: /orchestration/recipes/grok.md
---

# Grok image generation

Routes to xAI's Grok image API. Two operations — `createImage` and `editImage` — and a wide aspect-ratio menu (including extreme-wide and extreme-tall variants beyond what Google or OpenAI expose).

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For `editImage`: 1–3 source images (URLs, data URLs, or Base64)

## Text-to-image (`createImage`)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "grok",
      "operation": "createImage",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "aspectRatio": "1:1",
      "quantity": 1
    }
  }]
}
```

### Parameters

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `operation` | ✅ | — | `createImage` or `editImage`. |
| `prompt` | ✅ | — | Natural-language. |
| `aspectRatio` | | `1:1` | See the ratio table below. |
| `quantity` | | `1` | `1`–`4`. |

### Aspect ratios

Grok exposes a wider aspect-ratio menu than other commercial engines:

| Category | Ratios |
|----------|--------|
| Ultra-wide | `2:1`, `20:9`, `19.5:9`, `16:9` |
| Landscape | `4:3`, `3:2` |
| Square | `1:1` |
| Portrait | `2:3`, `3:4` |
| Vertical | `9:16`, `9:19.5`, `9:20`, `1:2` |

Useful when you need phone-native vertical ratios (`9:19.5` / `9:20` match modern flagship screens) or cinema-wide output (`2:1`, `20:9`):

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "grok",
      "operation": "createImage",
      "prompt": "A sweeping cinematic view of a futuristic city skyline at dusk",
      "aspectRatio": "20:9",
      "quantity": 1
    }
  }]
}
```

::: warning `21:9` isn't in the enum
Grok's list jumps from `20:9` to `16:9` — `21:9` (the common cinema label) isn't accepted. Use `20:9` as the closest cinematic-wide option.
:::

## Image editing (`editImage`)

Pass 1–3 reference images in `images[]`:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "grok",
      "operation": "editImage",
      "prompt": "Make it a winter scene with snow falling",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Edit operations don't take an `aspectRatio` — the output resolution follows the source(s).

## Reading the result

Standard `imageGen` output — an `images[]` array, one per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.png" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

xAI's API queue is the dominant factor. Typical wall time 10–30 s per image. `wait=60` is comfortable for `quantity ≤ 2`; higher batches or busy periods warrant `wait=0` + polling.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Flat per-image pricing by operation:

```
total = base × quantity
```

| Operation | Base (per image) |
|-----------|------------------|
| `createImage` | **~26** |
| `editImage` | **~29** |

Examples:

* `createImage`, `quantity: 1` → **~26 Buzz**
* `createImage`, `quantity: 4` → ~104 Buzz
* `editImage` with 1 reference, `quantity: 1` → **~29 Buzz**

Aspect ratio and reference count don't affect Grok's Buzz price — the provider charges flat per-image.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "aspectRatio must be one of" | Ratio outside the accepted enum (e.g. `21:9`) | Pick a close equivalent from the table above — `20:9` is the closest cinematic wide. |
| `400` with "images minItems" on edit | Empty `images[]` on `editImage` | Include 1–3 source images. |
| `400` with "images maxItems" | More than 3 source images | Trim to 3. |
| `400` with "quantity must be ≤ 4" | Requested more than 4 in one call | Split into multiple submissions or use a different engine with a higher cap (Flux / OpenAI gpt-image-1 / Seedream go up to 10–12). |
| Request timed out (`wait` expired) | Large `quantity` or busy xAI queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | xAI content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [OpenAI image generation](./openai) — alternative commercial tier
* [Google image generation](./google) / [Gemini](./gemini) — alternative commercial tier
* [Flux 2](./flux2) / [Qwen](./qwen) / [SDXL](./sdxl) — open-weights / sdcpp alternatives on Civitai-hosted workers
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `GrokCreateImageGenInput` and `GrokEditImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface

---

---
url: /orchestration/recipes/grok-video.md
---

# Grok video generation

xAI's Grok video model (Grok-Imagine-Video) via FAL, available through the `videoGen` step with `engine: "grok"`.

::: tip Grok image vs Grok video
The `grok` engine here is for **video generation**. For Grok image generation, see the separate [`imageGen` Grok recipe](./grok).
:::

Three operations: `text-to-video`, `image-to-video`, and `edit-video`. Two resolutions: 480p and 720p (default). Seven aspect ratios for text-to-video.

All Grok video jobs typically run 1–4 minutes — submit with `wait=0`.

## Text-to-video

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "grok",
      "operation": "text-to-video",
      "prompt": "A red fox trotting through a snowy forest at dusk",
      "aspectRatio": "16:9",
      "duration": 6,
      "resolution": "720p"
    }
  }]
}
```

## Image-to-video

Pass exactly one image in `images[]` to animate from it. Aspect ratio is inferred from the source image when `aspectRatio: "auto"`:

```json
{
  "engine": "grok",
  "operation": "image-to-video",
  "prompt": "The subject slowly turns their head and looks toward the horizon",
  "images": ["https://image.civitai.com/.../photo.jpeg"],
  "duration": 6,
  "resolution": "720p",
  "aspectRatio": "auto"
}
```

## Portrait video

Text-to-video accepts a wide aspect ratio set — use `9:16` for mobile-first content:

```json
{
  "engine": "grok",
  "operation": "text-to-video",
  "prompt": "A person speaking passionately on stage, dynamic lighting",
  "aspectRatio": "9:16",
  "duration": 6,
  "resolution": "720p"
}
```

## Video editing

Edit an existing video guided by a prompt. The input video is automatically resized to 854×480 and truncated to 8 seconds:

```json
{
  "engine": "grok",
  "operation": "edit-video",
  "prompt": "Transform the scene into a vintage sepia-toned film",
  "videoUrl": "https://example.com/input.mp4",
  "duration": 6,
  "resolution": "720p"
}
```

::: warning Edit-video is capped at 8 s
Grok truncates source videos to 8 seconds. Longer inputs are trimmed automatically. Cost is based on the analyzed duration, not the requested `duration` field.
:::

## Parameters

### Text-to-video

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"grok"` |
| `operation` | `"text-to-video"` | `"text-to-video"` |
| `prompt` | — ✅ | Generation prompt. |
| `duration` | `6` | 1–15 seconds. |
| `resolution` | `"720p"` | `"480p"` or `"720p"` |
| `aspectRatio` | `"16:9"` | `"16:9"`, `"4:3"`, `"3:2"`, `"1:1"`, `"2:3"`, `"3:4"`, `"9:16"` |

### Image-to-video

Same as text-to-video plus:

| Field | Default | Notes |
|-------|---------|-------|
| `operation` | — ✅ | `"image-to-video"` |
| `images[]` | — ✅ | Exactly 1 image (URL, data URL, or Base64). |
| `aspectRatio` | `"auto"` | `"auto"` infers ratio from the source image. Explicit ratios also accepted. |

### Edit-video

| Field | Default | Notes |
|-------|---------|-------|
| `operation` | — ✅ | `"edit-video"` |
| `videoUrl` | — ✅ | Source video URL. Resized to 854×480, capped at 8 s. |
| `duration` | `6` | Informational for the request; actual duration determined by the source video length (up to 8 s). |
| `resolution` | `"720p"` | `"480p"` or `"720p"` |

## Cost

Per-second pricing; `total = costPerSecond × duration`.

| Operation | Resolution | Buzz/s | Example — 6 s |
|-----------|------------|--------|---------------|
| `text-to-video` / `image-to-video` | `480p` | 65 | **390** |
| `text-to-video` / `image-to-video` | `720p` | 91 | **546** |
| `edit-video` | `480p` | 78 | **468** |
| `edit-video` | `720p` | 104 | **624** |

For `edit-video`, cost uses the **analyzed source video duration** (capped at 8 s), not the `duration` field in the request.

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

Grok video typically completes in 1–4 minutes. Use `wait=0` + polling or webhooks:

* **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` every 10–30 s

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "images must have exactly 1 item" | Sent 0 or 2+ images to `image-to-video` | Image-to-video requires exactly 1 source image. |
| `400` with "videoUrl is required" | Missing `videoUrl` on `edit-video` | Provide the source video URL. |
| `400` with "aspectRatio must be one of" on image-to-video | Sent an unsupported ratio | Image-to-video additionally accepts `"auto"` but has the same seven explicit ratios as t2v. |
| Cost higher than expected on edit-video | Source video longer than requested duration | Input is truncated to 8 s; cost is based on the actual analyzed length. |
| Step `failed`, `reason = "no_provider_available"` | No Grok worker available | Retry shortly. |
| Step `failed`, `reason = "blocked"` | xAI content policy | Don't retry the same input. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [Grok image generation](./grok) — Grok for images
* [Kling video generation](./kling) — comparable commercial video model
* [Veo 3 video generation](./veo3) — Google's video model

---

---
url: /orchestration/recipes/happy-horse.md
---

# Happy-Horse video generation

Alibaba's Happy-Horse video model, served through FAL. Four operations cover the common video workflows: text-to-video, image-to-video, video-to-video editing, and multi-character reference generation. The operation is selected by an explicit `operation` discriminator — fields invalid for that operation are rejected with a `400`.

| `operation` | Required inputs | What it does |
|---|---|---|
| `textToVideo` | `prompt` | Generate a clip from a text prompt. |
| `imageToVideo` | `image` | Animate a single source image as the first frame. |
| `videoEdit` | `sourceVideo`, `prompt` | Re-paint or restyle an existing clip; optional reference images guide the look. |
| `referenceToVideo` | `prompt`, `images` (1–9) | Subject-consistent generation using up to 9 character references. Cite them as `character1`…`character9` in the prompt. |

**Default choice**: `version: "v1.0"`, `resolution: "1080p"`, `duration: 5`. All Happy-Horse jobs exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — always submit with `wait=0`.

## The request shape

Every Happy-Horse request is a single `videoGen` step on [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). Three keys select which leaf schema the rest of the body is validated against:

```json
{
  "$type": "videoGen",
  "input": {
    "engine":    "happyHorse",
    "version":   "v1.0",
    "operation": "textToVideo"
  }
}
```

### Source-media inputs

`videoEdit` accepts `sourceVideo` as either:

* a Civitai AIR URN (`urn:air:…`), or
* a civitai-hosted URL (`image.civitai.com`, orchestrator blob URLs, civitai-managed R2 / B2 / Spaces).

Arbitrary third-party URLs are **not** fetched — requests that pass one are rejected with a `400`. Upload the video to Civitai first and pass the resulting URL. `image`, `images`, and `referenceImages` go through the image pipeline and *do* accept external URLs — only `sourceVideo` has this restriction.

## textToVideo

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "happyHorse",
      "version": "v1.0",
      "operation": "textToVideo",
      "prompt": "A little girl walking on a road at sunset, cinematic lighting, smooth camera movement",
      "aspectRatio": "16:9",
      "resolution": "1080p",
      "duration": 5
    }
  }]
}
```

## imageToVideo

Pass a single image as the first frame; `prompt` becomes optional and only steers the motion.

```json
{
  "engine": "happyHorse",
  "version": "v1.0",
  "operation": "imageToVideo",
  "prompt": "Camera slowly pushes in",
  "image": "https://image.civitai.com/.../first-frame.jpeg",
  "resolution": "1080p",
  "duration": 5
}
```

`aspectRatio` is **not** accepted here — output dimensions are derived from the input image. Source images must be at least 300px on the short side, ≤10 MB, and within a 1:2.5–2.5:1 aspect range.

## videoEdit

Re-paint or restyle an existing clip. The output duration matches the source; `duration` on the request applies to the cost preview only.

```json
{
  "engine": "happyHorse",
  "version": "v1.0",
  "operation": "videoEdit",
  "prompt": "Repaint the scene in vibrant anime style; reference @Image1 for the character outfit",
  "sourceVideo": "https://image.civitai.com/.../clip.webm",
  "referenceImages": [
    "https://image.civitai.com/.../style.jpeg"
  ],
  "audioSetting": "auto",
  "resolution": "1080p"
}
```

* `referenceImages` is optional — pass 0–5 images. Cite them in the prompt as `@Image1`–`@Image5`.
* `audioSetting`: `"auto"` regenerates a soundtrack to match the edit; `"origin"` keeps the source audio intact.
* FAL bills both the input *and* the output second on this operation, so the per-second rate is double the other modes — see [Cost](#cost).

## referenceToVideo

Generate with 1–9 character references. Cite each in the prompt with `character1`, `character2`, … `character9`.

```json
{
  "engine": "happyHorse",
  "version": "v1.0",
  "operation": "referenceToVideo",
  "prompt": "character1 and character2 walk together through a neon-lit alley",
  "images": [
    "https://image.civitai.com/.../subject-a.jpeg",
    "https://image.civitai.com/.../subject-b.jpeg"
  ],
  "aspectRatio": "16:9",
  "resolution": "1080p",
  "duration": 5
}
```

Reference images must be ≥400 px on the short side and ≤10 MB each.

## Parameters

Shared across operations unless noted. The per-operation schema in the [API reference](/orchestration/reference/) is authoritative.

| Field | Default | Used by | Notes |
|---|---|---|---|
| `engine` | — ✅ | All | `"happyHorse"` |
| `version` | — ✅ | All | `"v1.0"` |
| `operation` | — ✅ | All | See the table above. |
| `prompt` | — ✅ | All (optional on `imageToVideo`) | Up to 2500 characters. |
| `resolution` | `"1080p"` | All | `"720p"` or `"1080p"`. |
| `duration` | `5` | All except `videoEdit`'s output | Integer seconds, 3–15. `videoEdit` clamps output to the source video's length. |
| `aspectRatio` | `"16:9"` | `textToVideo`, `referenceToVideo` | `"16:9"`, `"9:16"`, `"1:1"`, `"4:3"`, `"3:4"`. |
| `image` | — ✅ | `imageToVideo` | Single image used as the first frame. |
| `sourceVideo` | — ✅ | `videoEdit` | Civitai-hosted URL or AIR URN — not arbitrary external. |
| `referenceImages[]` | `[]` | `videoEdit` | 0–5 images. |
| `audioSetting` | `"auto"` | `videoEdit` | `"auto"` regenerates audio, `"origin"` preserves it. |
| `images[]` | — ✅ | `referenceToVideo` | 1–9 character references. |
| `seed` | random | All | Integer for reproducibility, 0–2147483647. |
| `enableSafetyChecker` | `true` | All | Disable only when you have your own moderation. |

## Cost

Billed per output second in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

```
total = buzzPerSecond × duration
```

| Operation | 720p | 1080p |
|---|---|---|
| `textToVideo`, `imageToVideo`, `referenceToVideo` | **182** Buzz/s | **364** Buzz/s |
| `videoEdit` | **364** Buzz/s | **728** Buzz/s |

Example totals at `duration: 5`:

| Operation | 720p | 1080p |
|---|---|---|
| `textToVideo` / `imageToVideo` / `referenceToVideo` | **910** | **1 820** |
| `videoEdit` | **1 820** | **3 640** |

`videoEdit` is double the others because FAL bills both the input second and the output second — already encoded in the rate above.

## Reading the result

Same as any `videoGen` step — a single `video` blob:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

Happy-Horse jobs typically complete in 2–6 minutes (longer for `videoEdit` and 1080p). All exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — submit with `wait=0` and:

* **Webhooks** (recommended): register a callback with `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks).
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 10 s → 30 s → 60 s cadence.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `400` with unknown field | Field isn't valid for this `operation` | Each operation maps to its own typed schema (`HappyHorseV1<Op>Input`); check it via [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). |
| `400` "`sourceVideo` must be a Civitai AIR URN…" | Passed an external URL to `sourceVideo` | Re-upload the video to Civitai and use the resulting URL, or pass a `urn:air:…` URN. |
| `400` "referenceToVideo requires between 1 and 9 reference images" | `images` was empty or had >9 entries | Provide 1–9 images. |
| `400` "videoEdit accepts at most 5 reference images" | `referenceImages` had >5 entries | Trim to 5. |
| Step `failed`, `reason = "no_provider_available"` | FAL queue busy | Retry shortly. |
| Step `failed`, `reason = "blocked"` | Safety checker rejected input/output | Re-prompt; if you've handled moderation upstream, set `enableSafetyChecker: false`. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [Veo 3 video generation](./veo3) — comparable commercial multi-mode video model
* [Kling video generation](./kling) — another commercial multi-mode video model
* Full parameter catalog: the `HappyHorseV1<Operation>Input` schemas in the [API reference](/orchestration/reference/)
* [`videoGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `videoGen` surface

---

---
url: /orchestration/recipes/hunyuan.md
---

# HunyuanVideo generation

Tencent's HunyuanVideo open model, running on Civitai's Comfy workers. Text-to-video with LoRA support for custom subjects, styles, and motions.

```json
{
  "$type": "videoGen",
  "input": {
    "engine": "hunyuan",
    "prompt": "...",
    "width": 854,
    "height": 480,
    "duration": 5
  }
}
```

HunyuanVideo is compute-intensive — always submit with `wait=0`.

## Text-to-video

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "hunyuan",
      "prompt": "A majestic waterfall cascading down mossy rocks in a lush rainforest, slow motion",
      "width": 854,
      "height": 480,
      "duration": 5,
      "fps": 24,
      "steps": 40
    }
  }]
}
```

## With LoRAs

Attach community LoRAs to bias subject, style, or motion. Format: `{ "air": "<AIR URN>", "strength": 0.0–1.0 }`:

```json
{
  "engine": "hunyuan",
  "prompt": "A character from the LoRA walking through a neon-lit city at night",
  "width": 854,
  "height": 480,
  "duration": 5,
  "fps": 24,
  "steps": 40,
  "loras": [
    { "air": "urn:air:hyv1:lora:civitai:123456@789012", "strength": 0.8 }
  ]
}
```

## Using a custom model checkpoint

The default model is the base HunyuanVideo checkpoint. Override with any Civitai-hosted HunyuanVideo checkpoint using its AIR URN:

```json
{
  "engine": "hunyuan",
  "model": "urn:air:hyv1:checkpoint:civitai:<modelId>@<versionId>",
  "prompt": "...",
  "width": 854,
  "height": 480,
  "duration": 5
}
```

## Parameters

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"hunyuan"` |
| `prompt` | — ✅ | Generation prompt. |
| `model` | *(base HunyuanVideo)* | AIR URN for an alternative checkpoint. |
| `width` | `480` | Output width in pixels. Larger → slower and more expensive. |
| `height` | `480` | Output height in pixels. |
| `duration` | `5` | 1–30 seconds. |
| `fps` | `25` | Frame rate. Common values: `24`, `25`, `30`. |
| `steps` | `40` | 10–50 diffusion steps. More steps = higher quality, longer runtime. |
| `cfgScale` | `4` | 0–100. Guidance scale — lower is more creative. |
| `loras[]` | `[]` | Array of `{ air, strength }` LoRA attachments. |
| `seed` | random | Integer for reproducibility. |

## Recommended resolutions

| Use case | `width` | `height` | Notes |
|----------|---------|----------|-------|
| Fast / prototype | `480` | `480` | Square, minimal cost. |
| Landscape 480p | `854` | `480` | 16:9, good balance. |
| Portrait 480p | `480` | `854` | 9:16 for mobile. |
| Landscape 720p | `1280` | `720` | High quality; significantly slower. |

::: tip Resolution and cost
Cost scales approximately with pixel count × duration × steps. Doubling the resolution (~4× pixel area) increases cost roughly 4×. Use `whatif=true` to preview exact cost before committing.
:::

## Cost

HunyuanVideo cost depends on `width × height × duration × fps × steps`. The formula uses GPU-second estimation with a 5× markup, rounded to the nearest 25 Buzz (minimum 100 Buzz).

Use `whatif=true` to get an exact preview:

```bash
curl -s "https://orchestration.civitai.com/v2/consumer/workflows?whatif=true" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{"steps":[{"$type":"videoGen","input":{"engine":"hunyuan","prompt":"...","width":854,"height":480,"duration":5,"steps":40}}]}'
```

Approximate ranges (854×480, 24 fps):

| Duration | Steps | Approx. Buzz |
|----------|-------|--------------|
| 3 s | 20 | ~200–400 |
| 5 s | 40 | ~500–1 000 |
| 10 s | 40 | ~1 000–2 000 |

Actual cost varies with GPU load and model.

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

HunyuanVideo is compute-heavy. Expect 5–30 minutes depending on resolution, duration, and steps. Use `wait=0` + polling or webhooks:

* **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` every 30–60 s

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "steps out of range" | Value outside 10–50 | Clamp to 10–50. |
| `400` with "duration out of range" | Value outside 1–30 | Clamp to 1–30. |
| Very long queue wait | Large resolution / many steps | Reduce `width`/`height` or `steps` for iteration; scale up for final renders. |
| Step `failed`, `reason = "no_provider_available"` | No Comfy worker with HunyuanVideo warm | Retry shortly. |
| Output looks blurry at high resolution | Too few steps | Increase `steps` to 40–50 for larger resolutions. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [LTX2 video generation](./ltx2) — another Comfy-based open video model, generally faster
* [WAN video generation](./wan) — another Comfy/FAL open video model with broad operation support

---

---
url: /orchestration/recipes/convert-image.md
---

# Image conversion

`convertImage` is a utility step for format conversion, resizing, and blurring. It applies zero or more transforms in order, then encodes the result to the requested format. Cost is a flat **1 Buzz** regardless of image size, number of transforms, or output format.

## The request shape

```json
{
  "$type": "convertImage",
  "input": {
    "image":      "https://...",       // source — URL, data URL, or Base64
    "transforms": [ /* optional */ ],  // resize / blur — applied in order
    "output":     { "format": "jpeg" } // required — format + per-format settings
  }
}
```

`transforms` is optional (omit to change format or settings only). `output` is required.

## Examples

### Format conversion

Convert any image to JPEG:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "convertImage",
    "input": {
      "image": "https://image.civitai.com/.../source.png",
      "output": { "format": "jpeg", "quality": 85 }
    }
  }]
}
```

### Resize then convert

Resize to a target width (aspect ratio preserved) and encode to WebP:

```json
{
  "$type": "convertImage",
  "input": {
    "image": "https://...",
    "transforms": [{ "type": "resize", "targetWidth": 512 }],
    "output": { "format": "webp", "quality": 85 }
  }
}
```

### Region blur

Blur one or more rectangular areas — useful for privacy masking:

```json
{
  "$type": "convertImage",
  "input": {
    "image": "https://...",
    "transforms": [{
      "type": "blur",
      "blur": 60,
      "mode": "include",
      "regions": [
        { "x1": 50, "y1": 50, "x2": 400, "y2": 400 }
      ]
    }],
    "output": { "format": "jpeg", "quality": 85 }
  }
}
```

`mode: "include"` blurs only inside the regions; the rest stays sharp. `mode: "exclude"` blurs everything *except* the regions — use it to protect a subject while blurring the background.

### Full-image blur

`mode: "exclude"` with an empty `regions` array blurs the entire image (nothing is excluded):

```json
{
  "type": "blur",
  "blur": 40,
  "mode": "exclude",
  "regions": []
}
```

### PNG with metadata stripped

```json
{
  "$type": "convertImage",
  "input": {
    "image": "https://...",
    "output": { "format": "png", "hideMetadata": true }
  }
}
```

## Transforms reference

Transforms run in array order. You can chain multiple transforms — for example, resize first and then blur.

### `resize`

| Field | Default | Notes |
|-------|---------|-------|
| `type` | — ✅ | Must be `"resize"`. |
| `targetWidth` | *(none)* | Target width in pixels, 1–4096. Height is calculated to preserve aspect ratio. |

### `blur`

| Field | Default | Notes |
|-------|---------|-------|
| `type` | — ✅ | Must be `"blur"`. |
| `blur` | — ✅ | Gaussian blur intensity, 1–100. |
| `mode` | — ✅ | `"include"` — blur only inside regions. `"exclude"` — blur everywhere except regions. |
| `regions` | `[]` | Pixel-coordinate rectangles `{ x1, y1, x2, y2 }`. With `mode: "exclude"` and no regions, the entire image is blurred. With `mode: "include"` and no regions, nothing is blurred. |

## Output formats reference

### `jpeg`

| Field | Default | Notes |
|-------|---------|-------|
| `format` | — ✅ | `"jpeg"` |
| `quality` | `85` | 1–100. Higher = better quality, larger file. |
| `hideMetadata` | `false` | Strip EXIF and other metadata. |

### `png`

| Field | Default | Notes |
|-------|---------|-------|
| `format` | — ✅ | `"png"` |
| `hideMetadata` | `false` | Strip metadata. |

PNG is lossless — no quality setting.

### `webp`

| Field | Default | Notes |
|-------|---------|-------|
| `format` | — ✅ | `"webp"` |
| `quality` | `85` | 1–100. Applies only when `lossless: false`. |
| `lossless` | `false` | Enable lossless WebP compression. |
| `maxFrames` | `null` | Cap frame count for animated sources. Set to `1` to extract only the first frame. |
| `hideMetadata` | `false` | Strip metadata. |

### `gif`

| Field | Default | Notes |
|-------|---------|-------|
| `format` | — ✅ | `"gif"` |
| `maxFrames` | `null` | Cap frame count. Set to `1` to extract the first frame. |
| `hideMetadata` | `false` | Strip metadata. |

::: tip JPEG and PNG with animated sources
JPEG and PNG are inherently single-frame. Animated source images (GIF, animated WebP) are automatically reduced to the first frame when encoding to these formats — there is no `maxFrames` field to set. Use WebP or GIF output to preserve animation.
:::

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "convertImage",
    "status": "succeeded",
    "output": {
      "blob": {
        "id": "blob_...",
        "url": "https://.../signed.jpg",
        "width": 512,
        "height": 342
      }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

::: tip Result caching
`convertImage` is deterministic: the same source image, transforms, and output settings always produce the same blob. The orchestrator caches the result, so repeated identical calls skip re-processing and return the cached blob immediately.
:::

## Cost

Flat **1 Buzz** per step — regardless of source image size, number of transforms, or output format.

## Chaining with other steps

`convertImage` is most useful as a post-processing step. Chain it after `imageGen` using `$ref` to reference the previous step's output:

```json
{
  "steps": [
    {
      "name": "gen",
      "$type": "imageGen",
      "input": {
        "engine": "flux2",
        "prompt": "A photorealistic cat sitting in a sunny garden"
      }
    },
    {
      "name": "convert",
      "$type": "convertImage",
      "input": {
        "image": { "$ref": "gen.output.images[0].url" },
        "transforms": [{ "type": "resize", "targetWidth": 1024 }],
        "output": { "format": "webp", "quality": 90 }
      }
    }
  ]
}
```

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "output is required" | Missing `output` field | `output` is always required — include at minimum `{ "format": "jpeg" }`. |
| `400` with "targetWidth out of range" | Value outside 1–4096 | Clamp to 1–4096. |
| `400` with "blur out of range" | Value outside 1–100 | Clamp to 1–100. |
| `400` with "mode is required" | Blur transform sent without `mode` | `mode` is required on `blur` — set `"include"` or `"exclude"`. |
| Output height different from expected | `resize` maintains aspect ratio | Only `targetWidth` is specified; height is derived from the original aspect ratio. |
| Animated source collapsed to one frame | JPEG or PNG output requested | These formats are single-frame; use WebP or GIF output to preserve animation. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Image upscaling](./image-upscaler) — chain upscaling before `convertImage` for high-res output in a target format
* [Prompt enhancement](./prompt-enhancement) — another 1-Buzz utility step

---

---
url: /orchestration/recipes/image-upscaler.md
---

# Image upscaling

The `imageUpscaler` step type takes an image and returns a higher-resolution version. The **upscaler model** sets the scale factor per pass (a "4×" model like [4x-Remacri](https://civitai.com/models/147759/remacri?modelVersionId=164821) — the default — applies a 4× enlargement in one run). You can then run the same model up to 3 times in one step via `numberOfRepeats` for compounding scale.

Common uses:

* Finishing step after image generation (chain `imageGen` → `imageUpscaler`)
* Rescuing low-resolution assets
* Preparing images for print / large-format display

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A source image — URL, data URL, or Base64 string

## The simplest request

Use the per-recipe endpoint when you're just upscaling one image and don't need webhooks or multi-step chaining:

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/imageUpscaler?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "image": "https://image.civitai.com/.../00890-23.jpeg"
}
```

That's it — the defaults run the 4x-Remacri upscaler once. The response is a full [`Workflow`](/orchestration/reference/operations/GetWorkflow) whose single step carries the upscaled blob.

## Via the generic workflow endpoint

Equivalent request through [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — use this path when you need webhooks, tags, or to chain with other steps:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageUpscaler",
    "input": {
      "image": "https://image.civitai.com/.../00890-23.jpeg",
      "numberOfRepeats": 2
    }
  }]
}
```

## Input fields

See the [`ImageUpscalerInput` schema](/orchestration/reference/operations/InvokeImageUpscalerStepTemplate) for the full definition.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `image` | ✅ | — | URL, data URL, or raw Base64 string. Civitai CDN URLs work directly. |
| `model` | | [`4x-Remacri`](https://civitai.com/models/147759/remacri?modelVersionId=164821) (`urn:air:other:upscaler:civitai:147759@164821`) | AIR URN of the upscaler model. The model's own spec determines the **scale factor per pass**. |
| `numberOfRepeats` | | `1` | `1`–`3`. How many times to run the model end-to-end. Total scale ≈ `(model_scale) ^ numberOfRepeats`. |

### Picking a model

Two dimensions to consider:

**Content type** — different upscaler families handle different content best:

* **Photographic / real-world images** — general-purpose upscalers (ESRGAN derivatives like 4x-Remacri, the default).
* **Anime / illustrated art** — anime-tuned upscalers produce cleaner line work.
* **Faces / portraits** — face-restoration–aware upscalers reduce artifacts around features.

**Scale factor** — upscaler models advertise their scale in the name (`2x-…`, `4x-…`, `8x-…`). This is typically the multiplication factor per pass — a `4x` model on a 1024×1024 input produces 4096×4096 output in a single run. Combined with `numberOfRepeats: 2`, a 4× model produces a 16× total enlargement.

Browse [Civitai's upscaler catalog](https://civitai.com/models?tag=upscaler) and pass the AIR URN you want. Leave `model` unset to accept 4x-Remacri.

## Chaining: generate then upscale

One of the most common two-step workflows — produce at native resolution, then upscale with a single submission:

```json
{
  "steps": [
    {
      "$type": "imageGen",
      "name": "hero",
      "input": {
        "engine": "flux2",
        "model": "klein",
        "operation": "createImage",
        "modelVersion": "4b",
        "prompt": "A cat astronaut floating through neon space",
        "width": 1024,
        "height": 1024
      }
    },
    {
      "$type": "imageUpscaler",
      "name": "hero-4k",
      "input": {
        "image": {
          "$ref": "hero",
          "path": "output.images[0].url"
        },
        "numberOfRepeats": 1
      }
    }
  ]
}
```

The `{ "$ref": "hero", "path": "output.images[0].url" }` reference creates a dependency — `hero-4k` doesn't start until `hero` succeeds, and the upscaler's `image` field is filled in with the generated image's signed URL at runtime. See [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) for the full reference syntax.

## Targeting an exact resolution

Upscalers only know how to multiply (4× per pass with the default model). If you need a specific output width — say, 1920 px wide for a hero image — chain a `convertImage` step after the upscaler to downscale to your exact target.

```json
{
  "steps": [
    {
      "$type": "imageGen",
      "name": "hero",
      "input": {
        "engine": "flux2",
        "model": "klein",
        "operation": "createImage",
        "modelVersion": "4b",
        "prompt": "A cat astronaut floating through neon space",
        "width": 1024,
        "height": 1024
      }
    },
    {
      "$type": "imageUpscaler",
      "name": "upscaled",
      "input": {
        "image": { "$ref": "hero", "path": "output.images[0].url" },
        "numberOfRepeats": 1
      }
    },
    {
      "$type": "convertImage",
      "name": "hero-1920",
      "input": {
        "image": { "$ref": "upscaled", "path": "output.blob.url" },
        "transforms": [
          { "type": "resize", "targetWidth": 1920 }
        ],
        "output": {
          "format": "webp",
          "quality": 85,
          "lossless": false,
          "hideMetadata": true
        }
      }
    }
  ]
}
```

What happens at runtime:

1. **`hero`** generates a 1024×1024 image.
2. **`upscaled`** runs 4x-Remacri once → 4096×4096 (intermediate, oversized).
3. **`hero-1920`** downsamples to 1920 px wide (height auto-computed from aspect ratio = 1920×1920 here) and re-encodes as WebP at quality 85.

`ResizeTransform` keeps aspect ratio — set only `targetWidth` (1–4096). For other format / quality knobs see the [`ConvertImageInput` schema](/orchestration/reference/operations/InvokeConvertImageStepTemplate); supported `format` values are `jpeg`, `png`, `webp`, `gif`.

## Reading the result

A successful `imageUpscaler` step emits a single upscaled image blob:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageUpscaler",
    "status": "succeeded",
    "output": {
      "blob": { "id": "blob_...", "url": "https://.../signed.png" }
    }
  }]
}
```

Note: `imageUpscaler` output is `blob` (singular), not `blobs[]` — the step always returns exactly one image.

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) to get a fresh URL.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Cost scales with input pixel area and the total scale factor applied by `numberOfRepeats`:

```
inputMegapixels = width × height / 1 000 000
scale           = 2 ^ numberOfRepeats      // default 4x-Remacri, 2 per pass
total           = max(1, ceil(inputMegapixels)) × scale
```

| Shape | Buzz |
|-------|------|
| 512×512 input, `numberOfRepeats: 1` | **2** |
| 1024×1024 input, `numberOfRepeats: 1` | **4** |
| 2048×2048 input, `numberOfRepeats: 1` | **10** |
| 1024×1024 input, `numberOfRepeats: 2` | **8** |
| 1024×1024 input, `numberOfRepeats: 3` | **16** |

Upscaling is one of the cheapest operations exposed — even aggressive stacked passes on a 2-megapixel source land under a few dozen Buzz. The practical ceiling is usually the [upscaler's content-size cap](#runtime), not cost.

## Runtime

A single pass (`numberOfRepeats: 1`) with the default 4x-Remacri on a ~1-megapixel input usually completes in 5–15 s and fits inside `wait=60`. Multiple repeats stack both runtime *and* output size — `numberOfRepeats: 3` with a 4× model produces a 64× enlargement, which will exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) and is rarely what you actually want. Use `wait=0` plus webhooks/polling for anything beyond one pass, and keep the total scale in mind before cranking `numberOfRepeats`.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "image could not be loaded" | URL not publicly reachable, or data URL malformed | Make sure the URL is fetchable without auth; re-encode the Base64 payload. |
| `400` with "numberOfRepeats out of range" | Value outside `1`–`3` | Clamp client-side. |
| Output looks soft / painterly | Default model mismatch for this content | Specify a content-appropriate `model` AIR (anime-tuned for illustration, face-aware for portraits, etc.). |
| Output has halos or ringing | `numberOfRepeats` too aggressive for the source | Drop to a single pass; or pre-denoise the source. |
| Step `failed`, `reason = "blocked"` | Source image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`InvokeImageUpscalerStepTemplate`](/orchestration/reference/operations/InvokeImageUpscalerStepTemplate) — the per-recipe endpoint
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageUpscaler/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining
* [Video upscaling](./video-upscaler) — the `videoUpscaler` equivalent for video
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — how the `@step.output.*` references work

---

---
url: /site/reference/images.md
description: >-
  Browse images posted to Civitai, with filters for post, model, version, and
  creator.
---

# Images

Images are user-submitted outputs attached to posts. This endpoint powers the
gallery on civitai.com.

## List images

```
GET /api/v1/images
```

**Auth:** Public. Authenticated callers see content up to their configured
browsing level; anonymous callers are capped at the public browsing level.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `limit` | integer (0–200) | 50 | Number of items per page. |
| `page` | integer | — | 1-indexed page number. Incompatible with `cursor`. |
| `cursor` | string | — | Opaque cursor; use `metadata.nextCursor` from the previous response. |
| `postId` | integer | — | Restrict to a specific post. |
| `modelId` | integer | — | Images associated with any version of a model. |
| `modelVersionId` | integer | — | Images associated with a specific version. |
| `imageId` | integer | — | Single-image lookup. |
| `username` | string | — | Filter by uploader username. Auto-slugified. |
| `userId` | integer | — | Filter by uploader user ID. |
| `period` | `AllTime` | `Year` | `Month` | `Week` | `Day` | `AllTime` | Time window for sort metrics. |
| `sort` | `Most Reactions` | `Most Comments` | `Most Collected` | `Newest` | `Oldest` | `Random` | `Most Reactions` | |
| `nsfw` | `None` | `Soft` | `Mature` | `X` | boolean | — | Legacy NSFW filter; prefer `browsingLevel`. |
| `browsingLevel` | integer (bitmask) | — | Raw browsing-level bitmask. Takes precedence over `nsfw`. |
| `tags` | comma-separated integers | — | Tag IDs to require on each image. |
| `type` | `image` | `video` | `audio` | — | Media type. |
| `baseModels` | comma-separated strings | — | Filter to outputs from specific base models. |
| `withMeta` | boolean | `false` | If `true`, include the full `meta` object (prompt, resources, etc.). |

### Response

```json
{
  "items": [
    {
      "id": 9173928,
      "url": "https://image.civitai.com/.../cc242d6c-f960-4274-aa1d-f22a71e705ef.jpeg",
      "hash": "UA8N5},:Ioni~C#laKxaoznNwvx]XmRkVstR",
      "width": 832,
      "height": 1216,
      "type": "image",
      "nsfw": true,
      "nsfwLevel": "Soft",
      "browsingLevel": 2,
      "createdAt": "2025-04-17T21:28:57.225Z",
      "postId": 1981754,
      "username": "Ajuro",
      "baseModel": "SDXL 1.0",
      "modelVersionIds": [9208, 249861, 258687, 332071, 345685],
      "stats": {
        "cryCount": 1770,
        "laughCount": 2771,
        "likeCount": 21692,
        "dislikeCount": 0,
        "heartCount": 8044,
        "commentCount": 58
      },
      "meta": {
        "Size": "832x1216",
        "seed": 1938345220,
        "steps": 45,
        "sampler": "DPM++ 2M",
        "cfgScale": 5,
        "clipSkip": 2,
        "prompt": "...",
        "negativePrompt": "...",
        "resources": [],
        "civitaiResources": [
          { "type": "checkpoint", "modelVersionId": 345685 },
          { "type": "lora", "weight": 0.65, "modelVersionId": 249861 }
        ]
      }
    }
  ],
  "metadata": {
    "nextCursor": "1|1744925337225",
    "nextPage": "https://civitai.com/api/v1/images?limit=100&cursor=..."
  }
}
```

### Field notes

* `nsfwLevel` is the **string** form (`None`, `Soft`, `Mature`, `X`).
  `browsingLevel` is the raw bitmask — use this for precise filtering.
* `hash` is a BlurHash, suitable for rendering a placeholder while the
  `url` loads.
* `meta` is present only when the uploader included metadata at post time.
  The most common fields are listed above, but the object is free-form —
  tools like Automatic1111 and ComfyUI drop in their own keys. Treat unknown
  keys as opaque.
* `civitaiResources` inside `meta` maps each referenced resource to its
  Civitai `modelVersionId`, so you can round-trip back to
  [`GET /model-versions/{id}`](./model-versions).
* `modelVersionIds` at the top level is a deduped list of every model
  version referenced in `meta.civitaiResources`.

### Notes

* Page-based pagination is capped at `page * limit ≤ 1000`; deep traversal
  requires `cursor`. See [Pagination](../guide/pagination).
* On Civitai's "green" domain or from restricted regions, results are
  filtered to SFW regardless of the `nsfw` / `browsingLevel` parameter.
* `/images` defaults to `limit=50`. Lower it explicitly if you're only after
  a handful, or raise it up to `200` for fewer round-trips.

### Examples

```bash
# Newest images for a specific model
curl "https://civitai.com/api/v1/images?modelId=827184&sort=Newest&limit=10"

# All images in a post, with full generation metadata
curl "https://civitai.com/api/v1/images?postId=1981754&withMeta=true"

# Cursor-based traversal
curl "https://civitai.com/api/v1/images?limit=100" | jq '.metadata.nextCursor'
```

::: warning
Filtering by `modelId` on an extremely popular checkpoint (hundreds of
thousands of images) can exceed Cloudflare's 30s timeout. For large models,
fetch by `postId` or walk `cursor`-based pagination with `limit=100` instead
of sorting the whole set.
:::

---

---
url: /orchestration/guide.md
---

# Introduction

The Civitai Orchestrator is an API for running AI workloads — video generation, image generation, upscaling, transcription, text-to-speech, and more — without managing the underlying infrastructure.

You submit a **workflow**: a small JSON document describing what you want done. The orchestrator:

1. Converts workflow steps into **jobs**
2. Races multiple **providers** (FAL, Google, Bytedance, Civitai workers, and others) to claim each job
3. Streams results back — blobs (images/video/audio), text, or structured output

You get a single contract. The orchestrator handles provider selection, capacity, retries, and capability matching behind it.

## When to use this API

* You want to generate or transform media (video, image, audio, 3D) at scale
* You want provider redundancy without writing provider-specific code
* You want job tracking, webhooks, and resumable workflows out of the box
* You already have an AIR (Civitai resource identifier) and want to run inference against it

## Next steps

* [Quick start](./getting-started) — your first request in 5 minutes
* [Recipes](/orchestration/recipes/) — end-to-end examples (WAN video, Flux images, upscaling…)
* [API reference](/orchestration/reference/) — every operation, schema, and response

---

---
url: /orchestration/recipes/kling.md
---

# Kling video generation

Kuaishou's Kling model family, available in two generations through the `videoGen` step:

| `engine` | Models | Notes |
|----------|--------|-------|
| `kling` | `v1`, `v1.5`, `v1.6`, `v2`, `v2.5-turbo` | Original Kling. Text-to-video and image-to-video. |
| `kling-v3` | *(version-agnostic)* | Kling V3. Five operations including video-to-video and reference-to-video. Duration in seconds (3–15). |

**Default choice for new integrations**: `engine: "kling-v3"` with `operation: "text-to-video"`. For speed + cost, use `mode: "Standard"`; for highest quality, `mode: "Professional"`.

All Kling jobs exceed the [100-second timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — always submit with `wait=0` and handle results via webhooks or polling.

## Kling (original)

### Text-to-video

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "kling",
      "model": "v2.5-turbo",
      "prompt": "A serene mountain lake at dawn with mist rolling over the water",
      "aspectRatio": "16:9",
      "duration": "5"
    }
  }]
}
```

### Image-to-video

Pass `sourceImage` (URL, data URL, or Base64) to animate a start frame:

```json
{
  "engine": "kling",
  "model": "v1.6",
  "prompt": "The subject slowly turns to face the camera",
  "sourceImage": "https://image.civitai.com/.../photo.jpeg",
  "aspectRatio": "16:9",
  "duration": "5",
  "mode": "Standard"
}
```

### Parameters

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"kling"` |
| `model` | — ✅ | `"v1"` / `"v1.5"` / `"v1.6"` / `"v2"` / `"v2.5-turbo"` |
| `prompt` | — ✅ | Generation prompt. |
| `negativePrompt` | `null` | What to avoid. |
| `mode` | `"Standard"` | `"Standard"` or `"Professional"`. Affects quality and cost for v1/v1.5/v1.6. Ignored for v2/v2.5-turbo. |
| `aspectRatio` | `"16:9"` | `"16:9"`, `"9:16"`, `"1:1"` |
| `duration` | `"5"` | `"5"` or `"10"` (seconds). String enum. |
| `cfgScale` | `0.5` | 0–1. Prompt adherence. |
| `sourceImage` | `null` | URL / data URL / Base64. Enables image-to-video. |
| `cameraControl` | `null` | Fine camera motion — see [Camera control](#camera-control) below. |

### Cost

| Model | 5 s | 10 s |
|-------|-----|------|
| `v1` / `v1.5` / `v1.6` Standard | **600** | **1 200** |
| `v1` / `v1.5` / `v1.6` Professional | **1 050** | **2 100** |
| `v2` | **1 200** | **2 400** |
| `v2.5-turbo` | **600** | **1 200** |

### Camera control

Available on all models. Provide a `cameraControl` object with a `config` sub-object containing any of these axes (all -10 to 10, default null = no control):

| Axis | Effect |
|------|--------|
| `horizontal` | Translate left (−) / right (+) |
| `vertical` | Translate down (−) / up (+) |
| `pan` | Rotate left (−) / right (+) around Y axis |
| `tilt` | Rotate down (−) / up (+) around X axis |
| `roll` | Counter-clockwise (−) / clockwise (+) around Z axis |
| `zoom` | Narrow FOV (−) / widen FOV (+) |

```json
{
  "cameraControl": {
    "config": { "zoom": -3, "pan": 2 }
  }
}
```

***

## Kling V3 (`engine: "kling-v3"`)

Kling V3 introduces a richer operation set via the `operation` discriminator.

### Operations

| `operation` | Description | Key inputs |
|-------------|-------------|------------|
| `text-to-video` | Generate from a text prompt | `prompt` |
| `image-to-video` | Animate a start frame (optionally to an end frame) | `sourceImage`, optionally `endImage` |
| `reference-to-video` | Stylize video from reference images | `images[]` |
| `video-to-video-edit` | Edit an existing video guided by a prompt | `videoUrl` |
| `video-to-video-reference` | Reference an existing video's motion/structure | `videoUrl`, optionally `images[]` |

### Text-to-video

```json
{
  "engine": "kling-v3",
  "operation": "text-to-video",
  "prompt": "A timelapse of a flower blooming in a sunlit meadow",
  "aspectRatio": "16:9",
  "duration": 5,
  "mode": "Standard"
}
```

### Image-to-video

```json
{
  "engine": "kling-v3",
  "operation": "image-to-video",
  "prompt": "The cat stretches and yawns, then looks directly into the camera",
  "sourceImage": "https://image.civitai.com/.../photo.jpeg",
  "aspectRatio": "16:9",
  "duration": 5
}
```

Add `endImage` to interpolate between a start frame and an end frame:

```json
{
  "engine": "kling-v3",
  "operation": "image-to-video",
  "prompt": "Smooth cinematic transition",
  "sourceImage": "https://.../start.jpeg",
  "endImage":   "https://.../end.jpeg",
  "duration": 5
}
```

::: warning Placeholder URLs
The first-last-frame example uses `https://example.com/` placeholders. Replace them with publicly accessible image URLs before submitting.
:::

### Video-to-video

Edit or reference the motion of an existing video:

```json
{
  "engine": "kling-v3",
  "operation": "video-to-video-edit",
  "prompt": "Transform the scene into a vintage 1970s film aesthetic with grain",
  "videoUrl": "https://example.com/input.mp4",
  "duration": 5,
  "mode": "Standard"
}
```

Use `video-to-video-reference` to guide generation from a video's motion without directly editing it.

### Multi-prompt (Kling V3)

`multiPrompt` lets you sequence different prompts across a video timeline. Each entry has a `prompt` and a `duration` (seconds that prompt controls):

```json
{
  "engine": "kling-v3",
  "operation": "text-to-video",
  "prompt": "Base scene description",
  "multiPrompt": [
    { "prompt": "The camera slowly pushes in on the subject", "duration": 3 },
    { "prompt": "The subject looks up and the scene brightens", "duration": 4 }
  ]
}
```

### Audio generation (Kling V3)

Set `generateAudio: true` to produce a synchronized audio track. Optionally provide `voiceIds` to use a specific voice profile:

```json
{
  "generateAudio": true,
  "voiceIds": ["voice_abc123"]
}
```

For video-to-video operations, `keepAudio: true` (default) preserves the original video's audio.

### Parameters (Kling V3)

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"kling-v3"` |
| `operation` | `"text-to-video"` | See operations table above. |
| `prompt` | — ✅ | Generation prompt. |
| `mode` | `"Standard"` | `"Standard"` or `"Professional"`. |
| `duration` | `5` | 3–15 seconds (integer, unlike the original `kling` engine). |
| `aspectRatio` | `"16:9"` | `"16:9"`, `"9:16"`, `"1:1"` |
| `sourceImage` | `null` | Start frame for `image-to-video`. |
| `endImage` | `null` | End frame for first-last-frame interpolation. |
| `images[]` | `[]` | Reference images for `reference-to-video`. |
| `videoUrl` | `null` | Source video for `video-to-video-*` operations. |
| `generateAudio` | `false` | Generate a synchronized audio track. |
| `voiceIds` | `null` | Voice profile IDs for audio generation. |
| `keepAudio` | `true` | Preserve source video audio in video-to-video operations. |
| `multiPrompt[]` | `null` | Time-sequenced prompts `{ prompt, duration }`. |

### Cost (Kling V3)

Cost scales linearly with `duration`. All costs are in Buzz per second:

| Operation group | Mode | Audio | Buzz/s |
|-----------------|------|-------|--------|
| t2v / i2v / ref | Standard | No | 219 |
| t2v / i2v / ref | Standard | Yes | 292 |
| t2v / i2v / ref | Professional | No | 292 |
| t2v / i2v / ref | Professional | Yes | 364 |
| v2v-edit / v2v-ref | Standard | — | 328 |
| v2v-edit / v2v-ref | Professional | — | 437 |

Examples at `duration: 5`:

| Scenario | Buzz |
|----------|------|
| Standard t2v, no audio, 5 s | **~1 095** |
| Standard t2v, with audio, 5 s | **~1 460** |
| Professional t2v, no audio, 5 s | **~1 460** |
| Professional t2v, with audio, 5 s | **~1 820** |
| Standard video-to-video, 5 s | **~1 640** |
| Professional video-to-video, 5 s | **~2 185** |

***

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

Kling V3 Standard at 5 s typically completes in 2–5 minutes; Professional and longer durations take longer. Always use `wait=0` and handle via:

* **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 10 s → 30 s → 60 s cadence

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "duration must be one of" (kling) | Sent integer instead of string | The original `kling` engine uses string duration: `"5"` or `"10"`. |
| `400` with "model is required" (kling) | Missing `model` on the original engine | `model` is required for `kling`; it is not used by `kling-v3`. |
| `400` with "sourceImage is required" | Used `image-to-video` without an image | Provide `sourceImage` for `image-to-video`. |
| `400` with "videoUrl is required" | Used `video-to-video-*` without a source video | Provide `videoUrl` for video-to-video operations. |
| Step `failed`, `reason = "no_provider_available"` | No Kling worker available | Retry shortly. |
| Output doesn't match end frame | `endImage` ignored for `text-to-video` | Use `operation: "image-to-video"` with both `sourceImage` and `endImage` to interpolate frames. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [WAN video generation](./wan) — comparable open-source alternative
* [Veo 3 video generation](./veo3) — Google alternative for commercial-grade video

---

---
url: /orchestration/recipes/ltx2.md
---

# LTX2 video generation

LTX2 is Lightricks' open video-generation model family. The orchestrator exposes both LTX2 and the newer LTX2.3 through the `videoGen` step, running on Civitai's ComfyUI workers. This recipe covers both versions end-to-end.

## Versions at a glance

| `engine` | Models | Operations | Notes |
|----------|--------|------------|-------|
| `ltx2.3` | `22b-dev`, `22b-distilled` | `createVideo`, `firstLastFrameToVideo`, `editVideo`, `extendVideo`, `videoToVideo`, `audioToVideo` | Current release. Adds style transfer (`videoToVideo`) and audio-driven talking-head generation (`audioToVideo`). |
| `ltx2` | `19b-dev`, `19b-distilled` | `createVideo`, `firstLastFrameToVideo`, `editVideo`, `extendVideo` | Previous release. Still supported. |

**Default choice for new integrations**: `engine: "ltx2.3"`, `model: "22b-distilled"` for speed, `"22b-dev"` for maximum quality.

## The request shape

Every LTX2 request is a single `videoGen` step on [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). Three keys select which LTX2 variant runs:

```json
{
  "$type": "videoGen",
  "input": {
    "engine":    "ltx2.3",       // ltx2 | ltx2.3
    "operation": "createVideo",  // see table above
    "model":     "22b-distilled" // version-specific
  }
}
```

There's no `provider` discriminator — LTX2 currently only runs on Comfy. Each combination dispatches to its typed input schema (`ComfyLtx23CreateVideoInput`, `ComfyLtx2EditVideoInput`, …) so fields invalid for that combination get rejected with a `400`.

### Source-media inputs

`editVideo`, `extendVideo`, `videoToVideo`, and `audioToVideo` accept `sourceVideo` / `sourceAudio` as either:

* a Civitai AIR URN (`urn:air:…`), or
* a civitai-hosted URL (`image.civitai.com`, orchestrator blob URLs, civitai-managed R2 / B2 / Spaces).

Arbitrary third-party URLs (e.g. `raw.githubusercontent.com`, `cdn.jsdelivr.net`) are **not** fetched — requests that pass one are rejected with a `400`. Upload the media to Civitai first and pass the resulting URL. `images`, `firstFrame`, `lastFrame`, and `referenceImage` go through a separate image pipeline and *do* accept external URLs — only video/audio inputs have this restriction today.

## Operations

All examples target production and use `<your-token>` in place of your Bearer token. LTX2 jobs typically exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — submit with `wait=0` and handle completion via webhooks or polling.

### createVideo

Single operation covers both **text-to-video** and **image-to-video** — add `images` to turn any text-to-video request into image-to-video.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?whatif=false&wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "ltx2.3",
      "operation": "createVideo",
      "model": "22b-distilled",
      "prompt": "A beautiful sunset over the ocean with waves crashing",
      "duration": 5,
      "width": 1280,
      "height": 720,
      "fps": 24,
      "generateAudio": false,
      "guidanceScale": 4,
      "numInferenceSteps": 20
    }
  }]
}
```

Image-to-video: pass one or more images via `images`.

```json
{
  "engine": "ltx2.3",
  "operation": "createVideo",
  "model": "22b-dev",
  "prompt": "The cat starts walking and exploring",
  "images": [
    "https://image.civitai.com/.../42750475.jpeg"
  ],
  "duration": 5,
  "width": 1280,
  "height": 720,
  "fps": 24
}
```

### firstLastFrameToVideo

Interpolate between two keyframes (or extend from a single first frame).

```json
{
  "engine": "ltx2.3",
  "operation": "firstLastFrameToVideo",
  "model": "22b-dev",
  "prompt": "smooth transition from morning to night",
  "firstFrame": "https://.../start.jpeg",
  "lastFrame":  "https://.../end.jpeg",
  "frameGuideStrength": 0.8,
  "duration": 5,
  "width": 1280,
  "height": 720,
  "fps": 24
}
```

Omit `lastFrame` to seed the motion from just the first frame.

### editVideo

Input video + prompt → transformed video. Uses Canny edge-maps for structural preservation.

```json
{
  "engine": "ltx2.3",
  "operation": "editVideo",
  "model": "22b-dev",
  "prompt": "Transform the scene into a cyberpunk aesthetic with neon lighting",
  "sourceVideo": "https://.../input.mp4",
  "cannyLowThreshold": 0.4,
  "cannyHighThreshold": 0.8,
  "guideStrength": 0.7
}
```

### extendVideo

Continue an existing clip for `numFrames` more frames.

```json
{
  "engine": "ltx2.3",
  "operation": "extendVideo",
  "model": "22b-dev",
  "prompt": "The scene continues with gentle camera push-in",
  "sourceVideo": "https://.../clip.mp4",
  "numFrames": 48,
  "fps": 24
}
```

### videoToVideo *(LTX2.3 only)*

Style-transfer an entire video.

```json
{
  "engine": "ltx2.3",
  "operation": "videoToVideo",
  "model": "22b-dev",
  "prompt": "Rendered in the style of a watercolor painting",
  "sourceVideo": "https://.../clip.mp4"
}
```

### audioToVideo *(LTX2.3 only)*

Audio-driven generation. With just `sourceAudio`, produces a matching visual scene; add `referenceImage` for talking-head / lip-sync output.

```json
{
  "engine": "ltx2.3",
  "operation": "audioToVideo",
  "model": "22b-dev",
  "prompt": "A person speaks directly to camera with natural lip movements",
  "negativePrompt": "frozen lips, off-sync lips, blurry",
  "sourceAudio": "https://.../voiceover.mp3",
  "referenceImage": "https://.../portrait.jpeg",
  "audioToVideoAttentionScale": 2.0,
  "imageGuideStrength": 0.7,
  "duration": 5,
  "width": 1280,
  "height": 720,
  "fps": 24
}
```

## Common parameters

Shared across most (engine, operation) combinations. The per-variant schema in the [API reference](/orchestration/reference/) is authoritative.

| Field | Typical values | Notes |
|-------|----------------|-------|
| `model` | `22b-dev` / `22b-distilled` (2.3); `19b-dev` / `19b-distilled` (2.0) | `-distilled` is faster with slightly lower fidelity; `-dev` is maximum quality. |
| `width` / `height` | `1280×720`, `720×1280`, `1024×1024` | Vertical for phones: swap to `720×1280`. |
| `duration` | `3` or `20` seconds | Only these two values are accepted; no intermediate durations. |
| `fps` | `24`, `30` | Frame rate of the generated clip. |
| `guidanceScale` | `3`–`7` | Prompt adherence. Higher = closer to prompt but less creative. |
| `numInferenceSteps` | `8`–`50` | `20`–`40` is the practical quality sweet spot. More steps = higher quality, longer runtime. |
| `generateAudio` | `true` / `false` | Emit a soundtrack alongside the video. |
| `negativePrompt` | string | What you *don't* want. |
| `seed` | integer | Reproducibility. |
| `loras` | object | Attach community LoRAs to bias style or subject. Format: `{ "urn:air:lora:civitai:<modelId>@<versionId>": 0.8 }` — a dictionary keyed by AIR URN with the strength as the value. |

## Choosing a model

| Need | Pick |
|------|------|
| Fastest turnaround, batch generation | `22b-distilled` (or `19b-distilled`) |
| Highest fidelity, final-quality renders | `22b-dev` |
| Parity with an older pipeline | `19b-dev` / `19b-distilled` |

## Reading the result

Same as any `videoGen` step — a single `video` blob per clip:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

LTX2.3 `22b-dev` at 1280×720 / 5 s typically runs 2–5 minutes; `editVideo` and `audioToVideo` can go longer. All of these exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline), so prefer `wait=0` and:

* **Webhooks** (recommended): register a callback with `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 10 s → 30 s → 60 s cadence

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with unknown field | Field isn't valid for this `(engine, operation)` combo | Check the specific `ComfyLtx<Ver><Op>Input` schema via [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). |
| `400` "'sourceVideo' / 'sourceAudio' must be a Civitai AIR URN…" | Passed an external URL to `sourceVideo` or `sourceAudio` | Re-upload the media to Civitai and use the civitai-hosted URL, or pass a `urn:air:…` URN. See [Source-media inputs](#source-media-inputs). |
| Step `failed`, `reason = "no_provider_available"` | No Comfy worker has the requested model warm | Retry shortly; or try the other model (`-dev` ↔ `-distilled`). |
| Audio-to-video lip-sync poor | Attention scale too low, or audio clipping | Raise `audioToVideoAttentionScale` (e.g. `2.0` → `3.0`); re-encode source audio at constant bitrate. |
| Edit-video loses structure | Canny guide too weak | Raise `guideStrength` (`0.7` → `0.85`) or widen the Canny thresholds. |

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

All LTX2 / LTX2.3 variants use the same formula — pixel volume × a per-pixel rate × a steps multiplier:

```
numFrames          = duration × fps
pixelVolumeInMP    = (width × height × numFrames) / 1 000 000
stepsMultiplier    = steps / 20

total = ceil(pixelVolumeInMP × 0.0008 × 1000 × 1.5 × stepsMultiplier)
```

| Shape | Buzz |
|-------|------|
| 720p (1280×720), 5 s @ 24 fps, `steps: 20` | **~133** |
| 720p, 5 s @ 24 fps, `steps: 40` | ~266 |
| 720p, 10 s @ 24 fps, `steps: 20` | ~266 |
| 1080p (1920×1080), 5 s @ 24 fps, `steps: 20` | ~299 |

`extendVideo` and `editVideo` scale by their total output frame count the same way. LTX2 is the cheapest video-gen path Civitai exposes — expect roughly linear cost growth with pixels × frames × steps.

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production-ready result handling
* [WAN video generation](./wan) — comparable recipe for the WAN model family
* Full parameter catalog: the `ComfyLtx23<Operation>Input` and `ComfyLtx2<Operation>Input` schemas in the [API reference](/orchestration/reference/)
* [`videoGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `videoGen` surface (WAN, LTX2, Flux, etc.); import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes/training-ltx2.md
---

# LTX2 video LoRA training

Train a Lightricks LTX video LoRA on a small set of source video clips using AI Toolkit. The output LoRA is usable in [LTX2 video generation](./ltx2).

| `ecosystem` | Base | Buzz / epoch | Notes |
|-------------|------|--------------|-------|
| `ltx2` | `Lightricks/LTX-2` (19B) | variable (formula-based) | Original LTX2. Cost is computed per-step from clip count + duration. |
| `ltx23` | `Lightricks/LTX-2.3` (22B) | 200 (flat) | Newer LTX 2.3. Higher per-epoch cost reflects the heavier model — kept high deliberately to disincentivize very long runs. |

The base checkpoint is fixed by `ecosystem`; there's no `model` field on the input.

::: tip Long-running step
Video training is the slowest training mode on the platform. LTX 2.3 in particular is expensive — keep `epochs` ≤ 3 unless you have a clear reason. Always use `wait=0` and follow up via webhook or polling.
:::

## The request shape

```json
{
  "$type": "training",
  "input": {
    "engine":    "ai-toolkit",
    "ecosystem": "ltx2"          // ltx2 | ltx23
  }
}
```

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A training-data zip containing source video clips
* An accurate `count` of clips in the zip

## LTX2

Original 19B-parameter LTX video model. `resolution: 768` is the typical training resolution.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training", "video"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "ltx2",
      "epochs": 2,
      "resolution": 768,
      "lr": 0.0002,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "https://civitai-delivery-worker-prod.5ac0637cfd0766c97916cefa3764fbdf.r2.cloudflarestorage.com/training-images/4470934/2725414TrainingData.nuB3.zip",
        "count": 4
      },
      "samples": { "prompts": ["a video of TOK", "TOK moving in a garden"] }
    }
  }]
}
```

## LTX 2.3

Newer 22B model. Same shape as LTX2; `lr` is typically lower and the per-epoch cost is materially higher (200 Buzz / epoch vs. ltx2's variable formula-based cost).

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training", "video"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "ltx23",
      "epochs": 2,
      "lr": 0.0001,
      "trainTextEncoder": false,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "https://civitai-delivery-worker-prod.5ac0637cfd0766c97916cefa3764fbdf.r2.cloudflarestorage.com/training-images/4470934/2725414TrainingData.nuB3.zip",
        "count": 4
      },
      "samples": { "prompts": ["a video of TOK", "TOK moving in a garden"] }
    }
  }]
}
```

## Common parameters {#common-parameters}

Defaults shown are the post-`ApplyDefaults` values for both LTX ecosystems.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `engine` | ✅ | — | Always `ai-toolkit`. |
| `ecosystem` | ✅ | — | `ltx2` or `ltx23`. |
| `epochs` | | `5` | `1`–`20`. Billed per epoch. Keep low (2–3) for video. |
| `numberOfRepeats` | | (no auto-default) | `1`–`5000`. |
| `lr` | | `0.0001` | LTX2 examples often use `0.0002`; LTX 2.3 typically `0.0001`. |
| `trainTextEncoder` | | `false` | Leave off — LTX text encoder is not retrained by AI Toolkit. |
| `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. |
| `optimizerType` | | `adamw8bit` | See SDXL/SD1 page for full enum. |
| `networkDim` | | `32` | `1`–`256`. |
| `networkAlpha` | | matches `networkDim` | `1`–`256`. |
| `noiseOffset` | | `0` | `0`–`1`. |
| `flipAugmentation` | | `false` | Random horizontal flips. |
| `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. |
| `triggerWord` | | *(none)* | Activation token. |
| `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. Zip should contain video clips. |
| `samples.prompts[]` | | `[]` | Per-epoch preview videos. |
| `samples.negativePrompt` | | *(none)* | — |

## Reading the result

Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a video LoRA `.safetensors` blob plus any sample `.mp4` files. Use the trained LoRA in [LTX2 video generation](./ltx2) by referencing it in the workflow's `loras` field.

## Runtime

Per-epoch wall time, default settings on a 4-clip dataset:

| Ecosystem | Per-epoch | Typical full run |
|-----------|-----------|-------------------|
| `ltx2` | ~3–8 min | 6–16 min for 2 epochs |
| `ltx23` | ~5–12 min | 10–25 min for 2 epochs |

Always use `wait=0`.

## Cost

LTX2 uses a formula-based cost (per-step area + clip count); LTX 2.3 is flat at 200 Buzz / epoch.

```
ltx2:  total = epochs × computed_cost   (formula varies with clip count + duration)
ltx23: total = 200 × epochs
```

| Configuration | Buzz (training only) |
|---------------|---------------------|
| LTX2, `epochs: 2`, 4 clips | ~10–40 (depends on clip duration) + samples |
| LTX 2.3, `epochs: 2` | 400 + samples |
| LTX 2.3, `epochs: 5` | 1000 + samples |

Sample-prompt rendering uses LTX2 video-generation rates and is billed separately. Run with `whatif=true` to see the exact pre-flight charge.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "trainingData.sourceUrl not reachable" | Signed URL expired, or zip behind auth | Regenerate the URL. R2 signed URLs default to 24h. |
| Step `failed` with VRAM-related error | Resolution × clip length too high | Lower `resolution` (e.g. to `512`), shorten clips. |
| LTX 2.3 cost surprises you | Flat 200 Buzz / epoch, by design | Check `whatif=true` before submitting. Cap `epochs` at 2–3 unless you have budget. |
| Trained LoRA produces no motion | Too few epochs / static reference clips | Raise `epochs`, ensure clips show the motion you want learned. |
| Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged clips. |

## Related

* [Wan video LoRA training](./training-wan) — Wan video LoRA training (preview)
* [LTX2 video generation](./ltx2) — use a trained LoRA in LTX2 inference
* [Flux 2 Klein LoRA training](./training-flux2-klein) — image-side counterpart
* [Results & webhooks](/orchestration/guide/results-and-webhooks)
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow)
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml)

---

---
url: /orchestration/mcp.md
---

# Civitai Orchestration MCP Server

The orchestrator is also exposed as a remote [Model Context Protocol](https://modelcontextprotocol.io) server, so any MCP-aware client — Claude Desktop, claude.ai, Claude Code, Cursor, VS Code — can call it directly. The MCP server wraps the same workflow engine as the REST API, with one tool per recipe family (image, video, audio, music, training, analysis, utilities) plus tools for managing workflows.

If you already use the orchestrator via [REST](/orchestration/guide/getting-started), MCP gives you the same capabilities packaged for LLM agents: tools an agent can call, prompts that guide multi-step pipelines, and a resource scheme for fetching generated media inline.

## Endpoint

```
https://orchestration.civitai.com/mcp
```

Transport is **Streamable HTTP** — the modern MCP HTTP transport used by remote MCP servers in Claude Desktop and claude.ai. There is no binary to install; clients connect directly over HTTPS.

## Authentication

The MCP server uses the **same Civitai API key** as the [REST API](/orchestration/guide/authentication). Send it as a Bearer token in the `Authorization` header on every request:

If you've set your Civitai token in the navbar (top-right), the snippets on this page are pre-filled with it — copy and paste into your MCP client config. Otherwise they show a `YOUR_CIVITAI_API_KEY` placeholder.

Most tools (`generate_image`, `generate_video`, `transcribe_audio`, …) accept anonymous calls, but tools that read or list per-user state — most notably `list_workflows` — require a token. Authenticated calls are also tracked against your account for usage and Buzz accounting, so you'll generally want one configured.

## Connecting

### Claude Desktop

Add the server to `~/.claude/config.json` (or use **Settings → Developer → Edit Config**), then restart Claude Desktop:

The server appears in the MCP picker, and its tools become available in any conversation.

### claude.ai

In claude.ai, add a custom remote MCP server under **Settings → Connectors → Add custom connector**:

* **URL:** `https://orchestration.civitai.com/mcp`
* **Authentication:** Custom header `Authorization` with the Bearer value below

### Claude Code / Cursor / VS Code MCP

For Claude Code, run:

For Cursor or VS Code with the MCP extension, add the same shape to your `mcp.json`:

### Generic HTTP MCP clients

Any MCP client that speaks Streamable HTTP can connect — point it at `/mcp` and send the `Authorization` header. The server advertises full capabilities (`tools`, `prompts`, `resources`, all with `listChanged`) on `initialize`.

## What's available

* **Generation tools** — image, video, audio (TTS / transcription), music
* **Media utilities** — upscale, convert, frame extraction
* **Analysis** — caption, rate (NSFW / safety), tag
* **LLM access** — `chat_completion` against any OpenRouter model
* **Discovery** — `find_models` natural-language search across the catalog
* **Workflow management** — submit raw workflow JSON, get / cancel / list workflows
* **Prompts** — three built-in pipeline guides for common tasks
* **Resources** — `spine://blobs/{blobId}` for inline retrieval of generated media

See the [tools reference](/orchestration/mcp/tools) for the full catalog.

## Related

* [Authentication](/orchestration/guide/authentication) — how to get and rotate a Civitai API key
* [Recipes](/orchestration/recipes/) — REST examples for the same workflows the MCP tools wrap
* [Tools, prompts, and resources](/orchestration/mcp/tools) — full MCP catalog

---

---
url: /site/reference/model-versions.md
description: Fetch a specific version of a Civitai model by ID or file hash.
---

# Model versions

A **model version** is a single release within a model — one set of files, a
specific `baseModel`, its own stats, and its own AIR identifier. Models may
have many versions; call these endpoints when you need a specific one.

## Get a model version

```
GET /api/v1/model-versions/{id}
```

**Auth:** Mixed. A valid token exposes a few extra fields (e.g. early-access
data for resources the caller has unlocked).

### Path parameters

| Name | Type | Description |
|------|------|-------------|
| `id` | integer | Model version ID. |

### Response

```json
{
  "id": 2514310,
  "modelId": 827184,
  "name": "v16.0",
  "description": null,
  "baseModel": "Illustrious",
  "baseModelType": "Standard",
  "air": "urn:air:sdxl:checkpoint:civitai:827184@2514310",
  "status": "Published",
  "availability": "Public",
  "nsfwLevel": 3,
  "createdAt": "2025-12-18T08:55:00.000Z",
  "updatedAt": "2025-12-18T09:16:12.062Z",
  "publishedAt": "2025-12-18T09:16:12.062Z",
  "uploadType": "Created",
  "usageControl": "Download",
  "trainedWords": [],
  "earlyAccessConfig": null,
  "earlyAccessEndsAt": null,
  "trainingStatus": null,
  "trainingDetails": null,
  "stats": { "downloadCount": 215627, "thumbsUpCount": 13828 },
  "model": {
    "name": "WAI-illustrious-SDXL",
    "type": "Checkpoint",
    "nsfw": false,
    "poi": false
  },
  "files": [ /* see below */ ],
  "images": [ /* preview images, filtered by browsing level */ ],
  "downloadUrl": "https://civitai.com/api/download/models/2514310"
}
```

Each entry in `files[]`:

```json
{
  "id": 2402203,
  "name": "waiIllustriousSDXL_v160.safetensors",
  "type": "Model",
  "sizeKB": 6775430.35,
  "metadata": { "format": "SafeTensor", "size": "pruned", "fp": "fp16" },
  "pickleScanResult": "Success",
  "virusScanResult": "Success",
  "hashes": {
    "AutoV1": "4748A7F6",
    "AutoV2": "A5F58EB1C3",
    "SHA256": "A5F58EB1C33616...",
    "CRC32": "DAEE95B7",
    "BLAKE3": "1A411D9B...",
    "AutoV3": "22D8CB95B807"
  },
  "downloadUrl": "https://civitai.com/api/download/models/2514310",
  "primary": true
}
```

Returns `404` if the version doesn't exist or isn't published (moderators
bypass the published check).

### Notes

* The `air` field is the canonical [AIR identifier](../guide/air). Forward it directly to the Orchestration API when you need to reference this resource in a workflow.
* `images[]` respects the caller's browsing level — SFW-gated callers never see mature previews. On Civitai's "green" domain or from restricted regions, images are filtered to SFW regardless of session.
* `files[]` only contains public files. Private / archived files are omitted.
* `model.mode` appears as `Archived` or `TakenDown` when the parent model has been moderated. When archived, `files[]` and `downloadUrl` are dropped; when taken down, `images[]` is dropped as well. The field is omitted entirely on healthy models.
* `stats` has only `downloadCount` and `thumbsUpCount` here — model-version-level metrics. Use [`GET /models/{id}`](./models#get-a-model) if you need the full set including comments and tipping.

### Example

```bash
curl "https://civitai.com/api/v1/model-versions/2514310" | jq '{id, name, air, downloadUrl}'
```

## Get a model version by file hash

```
GET /api/v1/model-versions/by-hash/{hash}
```

**Auth:** Public.

Useful when you have a local file and want to identify the model without
downloading anything from Civitai. Accepts any of the hash types Civitai
records: `AutoV1`, `AutoV2`, `AutoV3`, `SHA256`, `BLAKE3`, or `CRC32`. The
hash is matched case-insensitively.

### Path parameters

| Name | Type | Description |
|------|------|-------------|
| `hash` | string | File hash. |

### Response

Same shape as `GET /model-versions/{id}`.

Returns `404` if no matching file is found, or the file belongs to an
unpublished version.

### Example

```bash
# Identify a local .safetensors by its SHA256
sha256sum model.safetensors
# a5f58eb1c33616c4f06bca55af39876a7b817913cd829caa8acb111b770c85cc

curl "https://civitai.com/api/v1/model-versions/by-hash/A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC" \
  | jq '{id, modelId, name, air}'
```

## Bulk lookup by hash

```
POST /api/v1/model-versions/by-hash
```

**Auth:** Public.

Same as `GET /by-hash/{hash}`, but takes up to **100** SHA256 hashes in a
single request. Useful when scanning a directory of local files. Hashes
shorter or longer than 64 characters are rejected (`400`); each must be the
full SHA256.

### Request body

```json
[
  "A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC",
  "B7C9D1F2A3E4B5C6D7E8F9A0B1C2D3E4F5A6B7C8D9E0F1A2B3C4D5E6F7A8B9C0"
]
```

### Response

An array of model version objects, same shape as `GET /model-versions/{id}`.
Hashes that don't match any file are silently dropped — the response can
have fewer entries than the request.

```json
[
  { "id": 2514310, "modelId": 827184, "name": "v16.0", "...": "..." }
]
```

### Errors

| Status | Cause |
|--------|-------|
| `400` | Missing body, non-array, hash not 64 chars, or more than 100 entries. The error message lists the first parse failure. |

### Example

```bash
curl -X POST -H "Content-Type: application/json" \
  -d '["A5F58EB1...","B7C9D1F2..."]' \
  "https://civitai.com/api/v1/model-versions/by-hash"
```

::: tip
If you only need the IDs (e.g. to feed back into the Orchestration API or to
de-duplicate a download list), use the lighter
[`/by-hash/ids`](#bulk-lookup-hash-id) endpoint below — it returns just
`{modelVersionId, hash}` pairs and is cheaper.
:::

## Bulk lookup hash → ID {#bulk-lookup-hash-id}

```
POST /api/v1/model-versions/by-hash/ids
```

**Auth:** Public.

Resolves SHA256 hashes to model version IDs only. Accepts up to **10,000**
hashes per call. Use this when you don't need the full version object —
e.g. to dedupe a download list or to map local files back to Civitai IDs in
bulk.

### Request body

```json
[
  "A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC",
  "B7C9D1F2A3E4B5C6D7E8F9A0B1C2D3E4F5A6B7C8D9E0F1A2B3C4D5E6F7A8B9C0"
]
```

### Response

```json
[
  { "modelVersionId": 2514310, "hash": "A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC" }
]
```

Unmatched hashes are silently dropped.

### Example

```bash
# Map a manifest of local files to model version IDs
jq -r '.files[].sha256' manifest.json \
  | jq -R . | jq -s . \
  | curl -X POST -H "Content-Type: application/json" -d @- \
      "https://civitai.com/api/v1/model-versions/by-hash/ids"
```

## Get a minimal model version

```
GET /api/v1/model-versions/mini/{id}
```

**Auth:** Mixed.

A trimmed-down version of `GET /model-versions/{id}`, intended for clients
that need the bare minimum to **download a file** or **identify whether the
caller can generate** with it. Skips heavy fields like `images[]`,
`description`, and the full `files[]` array.

### Path parameters

| Name | Type | Description |
|------|------|-------------|
| `id` | integer | Model version ID. |

### Query parameters

| Name | Type | Description |
|------|------|-------------|
| `epoch` | integer | For `Private` training-result versions, request a specific epoch's file. Falls back to the last epoch if omitted. |

### Response

```json
{
  "air": "urn:air:sdxl:checkpoint:civitai:827184@2514310",
  "versionName": "v16.0",
  "modelName": "WAI-illustrious-SDXL",
  "baseModel": "Illustrious",
  "availability": "Public",
  "publishedAt": "2025-12-18T09:16:12.062Z",
  "size": 6775430.35,
  "fileType": "Model",
  "fileName": "waiIllustriousSDXL_v160.safetensors",
  "hashes": {
    "AutoV1": "4748A7F6",
    "AutoV2": "A5F58EB1C3",
    "SHA256": "A5F58EB1C33616...",
    "CRC32": "DAEE95B7",
    "BLAKE3": "1A411D9B...",
    "AutoV3": "22D8CB95B807"
  },
  "downloadUrls": ["https://civitai.com/api/download/models/2514310"],
  "format": "SafeTensor",
  "canGenerate": true,
  "isFeatured": false,
  "requireAuth": false,
  "checkPermission": false,
  "earlyAccessEndsAt": null,
  "freeTrialLimit": null,
  "additionalResourceCharge": false,
  "minor": false,
  "sfwOnly": false
}
```

### Field notes

| Field | Description |
|-------|-------------|
| `canGenerate` | `true` when the resource can be used in an Orchestration workflow for the calling user. Combines coverage, availability, and permission checks. |
| `checkPermission` | `true` when the resource is gated (early-access window active, or `Private`). Pair with [`/permissions/check`](./permissions) for an explicit yes/no. |
| `requireAuth` | When `true`, the `downloadUrls` require a token (passed as `Authorization: Bearer` or `?token=`). |
| `earlyAccessEndsAt` | Only present when `checkPermission` is `true`. ISO timestamp when the early-access window ends. |
| `freeTrialLimit` | Number of free generations allowed during early access, when configured. |
| `additionalResourceCharge` | `true` when generating with this resource costs extra Buzz beyond the base workflow cost. |

Returns `404` if the version doesn't exist, isn't published, the primary
file is missing, or (for private training results) the requested `epoch`
isn't found.

### Example

```bash
# Just the download URL and SHA256, fast
curl "https://civitai.com/api/v1/model-versions/mini/2514310" \
  | jq '{air, downloadUrls, "sha256": .hashes.SHA256}'
```

---

---
url: /site/reference/models.md
description: List and fetch Civitai models.
---

# Models

A **model** represents a trained AI resource published on Civitai — a
checkpoint, LoRA, textual inversion, VAE, ControlNet, upscaler, etc. Each
model has one or more [model versions](./model-versions) containing the
actual files.

## List models

```
GET /api/v1/models
```

**Auth:** Mixed — the `favorites` and `hidden` params require a bearer token.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `limit` | integer (1–100) | 100 | Number of items per page. |
| `page` | integer (≥ 1) | — | 1-indexed page number. Incompatible with `query`. |
| `cursor` | string | — | Opaque pagination cursor. Use `metadata.nextCursor` from the previous response. |
| `query` | string | — | Full-text search (Meilisearch). Requires cursor-based pagination. |
| `ids` | comma-separated integers | — | Restrict to specific model IDs. |
| `tag` | string | — | Filter by tag name. |
| `username` | string | — | Filter by creator. Auto-slugified. |
| `types` | `ModelType` or `ModelType[]` | — | One or more of the values from `GET /enums` (`ModelType`). Repeat the param or comma-separate. |
| `baseModels` | string or string\[] | — | Filter by base model (e.g. `SDXL 1.0`, `Flux.1 D`). See `GET /enums` (`BaseModel`). |
| `checkpointType` | `Standard` | `Trained` | `Merge` | — | For checkpoint models only. |
| `sort` | `Highest Rated` | `Most Downloaded` | `Newest` | ... | `Highest Rated` | See source for full list. |
| `period` | `AllTime` | `Year` | `Month` | `Week` | `Day` | `AllTime` | Time window for sort metrics. |
| `nsfw` | boolean | `false` | If `true`, include mature content. Ignored on SFW-gated regions. |
| `supportsGeneration` | boolean | — | Only return models supported by on-site generation. |
| `fromPlatform` | boolean | — | Only return models trained on Civitai. |
| `earlyAccess` | boolean | — | Include early-access versions. |
| `primaryFileOnly` | boolean | `false` | Drop non-primary files from each version's `files[]`. |
| `favorites` | boolean | `false` | *(auth required)* Only models in the caller's bookmark collection. |
| `hidden` | boolean | `false` | *(auth required)* Only models the caller has hidden. |

Unknown params are silently ignored after Zod parsing; invalid ones return `400`.

### Response

```json
{
  "items": [
    {
      "id": 827184,
      "name": "WAI-illustrious-SDXL",
      "description": "<p>...</p>",
      "type": "Checkpoint",
      "nsfw": false,
      "nsfwLevel": 31,
      "availability": "Public",
      "supportsGeneration": true,
      "allowNoCredit": true,
      "allowCommercialUse": "{Image,RentCivit}",
      "allowDerivatives": true,
      "allowDifferentLicense": true,
      "minor": false,
      "poi": false,
      "sfwOnly": false,
      "mode": null,
      "stats": {
        "downloadCount": 1272529,
        "thumbsUpCount": 79272,
        "thumbsDownCount": 202,
        "commentCount": 1931,
        "tippedAmountCount": 156742
      },
      "creator": {
        "username": "WAI0731",
        "image": "https://image.civitai.com/.../WAI0731.jpeg"
      },
      "tags": ["base model", "anime"],
      "modelVersions": [
        {
          "id": 2514310,
          "name": "v16.0",
          "baseModel": "Illustrious",
          "baseModelType": "Standard",
          "publishedAt": "2025-12-18T09:16:12.062Z",
          "supportsGeneration": true,
          "stats": { "downloadCount": 215627, "thumbsUpCount": 13828, "thumbsDownCount": 22 },
          "files": [
            {
              "id": 2402203,
              "name": "waiIllustriousSDXL_v160.safetensors",
              "type": "Model",
              "sizeKB": 6775430.35,
              "hashes": {
                "AutoV2": "A5F58EB1C3",
                "SHA256": "A5F58EB1C3...",
                "BLAKE3": "1A411D9B..."
              },
              "downloadUrl": "https://civitai.com/api/download/models/2514310",
              "primary": true,
              "metadata": { "format": "SafeTensor", "size": "pruned", "fp": "fp16" }
            }
          ],
          "images": [],
          "downloadUrl": "https://civitai.com/api/download/models/2514310"
        }
      ]
    }
  ],
  "metadata": {
    "nextCursor": "75363|932023|257749",
    "nextPage": "https://civitai.com/api/v1/models?limit=100&cursor=...",
    "currentPage": 1,
    "pageSize": 100
  }
}
```

When using `page` pagination, `metadata` additionally includes `currentPage` and `pageSize`. When using `cursor` pagination, those are omitted.

### Notes

* `page * limit` above 1000 returns `429`; use `cursor` for deep paging. See [Pagination](../guide/pagination).
* Including `query` without `cursor` is fine; combining `query` with `page` returns `400`.
* Only `Published` versions are returned to non-moderator callers. Files marked non-public by the uploader are hidden from `files[]`.
* `mode` is non-null when the parent model has been moderated. Values: `Archived` (drops `files[]` and `downloadUrl`) and `TakenDown` (also drops `images[]`). Omitted entirely on healthy models.

### Example

```bash
curl "https://civitai.com/api/v1/models?limit=5&types=LORA&baseModels=SDXL%201.0&sort=Most%20Downloaded"
```

## Get a model

```
GET /api/v1/models/{id}
```

**Auth:** Public.

### Path parameters

| Name | Type | Description |
|------|------|-------------|
| `id` | integer | Model ID. |

### Response

Returns the same shape as a single item from the list endpoint — same
top-level keys (`id`, `name`, `type`, `modelVersions`, `creator`, `tags`,
`stats`, ...).

Returns `404` if the model doesn't exist:

```json
{ "error": "No model with id 0" }
```

### Example

```bash
curl "https://civitai.com/api/v1/models/827184"
```

---

---
url: /orchestration/recipes/multi-speaker-dialogue.md
---

# Multi-speaker dialogue

The `audioMix` step overlays multiple audio clips on a single timeline, each placed at its own start offset with optional per-track volume and fades. Pair it with N `textToSpeech` steps to produce multi-speaker dialogue, debate, or audio drama — including overlap, interruption, and cross-talk that single-utterance TTS can't model on its own.

::: tip Why not a multi-speaker TTS engine?
Qwen3 TTS synthesises one continuous utterance per request with no silence-injection or speaker switching. Asking the model to "say A, then pause, then say B" produces unpredictable prosody. Generating each line as its own short, clean TTS step and overlaying them with `audioMix` keeps every utterance natural while letting you place them anywhere on the output timeline — including overlapping intervals, which is the only way to get genuine cross-talk.
:::

## How it composes

Every dialogue workflow has the same shape:

1. **One `textToSpeech` step per spoken line** — each step picks its own speaker (built-in voice, voice clone, or voice design) and produces a short clean clip.
2. **One trailing `audioMix` step** referencing each TTS output via `$ref`. By default, tracks play back-to-back in array order — no timing math required.

```json
{
  "steps": [
    { "$type": "textToSpeech", "name": "alice", "input": { /* ... */ } },
    { "$type": "textToSpeech", "name": "bob",   "input": { /* ... */ } },
    {
      "$type": "audioMix",
      "input": {
        "tracks": [
          { "url": { "$ref": "alice", "path": "output.audioBlob.url" } },
          { "url": { "$ref": "bob",   "path": "output.audioBlob.url" } }
        ]
      }
    }
  ]
}
```

### Timeline rules

Each track resolves a start time on the output timeline using these rules:

* **Implicit (default)**: track *i* starts when track *i-1* ends (in array order). No fields needed.
* **`offset`**: float in seconds, nudges the implicit position. Negative = overlap/interrupt; positive = gap. `offset: -0.5` means "start 500 ms before the previous track ends".
* **`startSeconds`**: absolute timeline anchor. When set, the track plays at exactly this time **and is excluded from the implicit chain** — perfect for music beds. Other tracks in the array compute their implicit position as if anchored tracks weren't there.

If both `startSeconds` and `offset` are set on the same track, `startSeconds` wins.

The output is a single Ogg Vorbis blob plus a `tracks[]` array echoing each input's resolved `startSeconds` and probed `duration` — convenient for rendering subtitles or speaker highlights without re-probing.

## Sequential reading

Three speakers, each clip placed after the previous one ends. No overlap; the gaps between clips are silent.

## Crosstalk and interruption

A speaker starts before the previous one finishes. ffmpeg's `amix` sums the overlapping samples, so the two voices are audible simultaneously. Small `fadeInMs` on the interrupter softens the entry.

For a "hot debate" effect, set `offset: -0.3` to `-1.0` on each interrupter — negative offsets pull the track earlier on the timeline. Use mild attenuation (`volumeDb: -1` to `-3`) on whichever speaker should sit slightly back in the mix.

## Adding a music or ambience bed

The `url` field also accepts a direct URL string — no `$ref` needed — so you can drop in static background music or ambience under a voice track.

The bed sits at `volumeDb: -18` (well under speech), fades in over 500 ms, and fades out over 1.5 s. Keep beds at -15 dB or lower against speech.

## Input fields

### `audioMix` step

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `tracks` | ✅ | — | Array of tracks to overlay. At least one. |
| `normalize` | | `false` | When `true`, ffmpeg's `amix` divides by N to avoid clipping. Keep `false` when you've set per-track `volumeDb` and want the levels you specified. |
| `maxDurationSeconds` | | `600` | Server-side cap on output length. The job fails early if the union of track intervals exceeds this. |

### Per-track fields

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `url` | ✅ | — | Either a direct `"https://..."` URL string, or a `{ "$ref": "<step-name>", "path": "output.audioBlob.url" }` referencing a prior step's output. |
| `startSeconds` | | implicit | Absolute timeline anchor. Set this to pin a track to a fixed time (music bed, ambience). When set, the track is taken out of the implicit-sequencing chain. When unset, the track plays after the previous non-anchored track ends. |
| `offset` | | `0` | Seconds to nudge this track from its implicit position. Negative = overlap/interrupt; positive = gap. Ignored when `startSeconds` is set. |
| `volumeDb` | | `0` | Per-track gain in dB. `-3` halves perceived loudness; `-18` is a typical music-bed level. |
| `fadeInMs` | | `0` | Linear fade-in length applied at the track's resolved start. |
| `fadeOutMs` | | `0` | Linear fade-out applied at the track's tail (resolved start + duration − fadeOutMs). |

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [
    { "name": "opener", "$type": "textToSpeech", "output": { "audioBlob": { /* ... */ } } },
    { "name": "pro",    "$type": "textToSpeech", "output": { "audioBlob": { /* ... */ } } },
    { "name": "con",    "$type": "textToSpeech", "output": { "audioBlob": { /* ... */ } } },
    {
      "name": "3",
      "$type": "audioMix",
      "status": "succeeded",
      "output": {
        "audioBlob": {
          "id": "ZXNS7C...ogg",
          "url": "https://orchestration-new.civitai.com/v2/consumer/blobs/ZXNS7C...ogg?sig=...",
          "duration": 18.2
        },
        "tracks": [
          { "startSeconds":  0.0, "duration": 5.7 },
          { "startSeconds":  5.7, "duration": 5.9 },
          { "startSeconds": 11.6, "duration": 6.2 }
        ]
      }
    }
  ]
}
```

* **`audioBlob.url`** — signed URL for the mixed Ogg Vorbis output. Stream it directly in an `<audio src>` tag.
* **`audioBlob.duration`** — total output length in seconds (max of resolved `startSeconds + duration` across tracks).
* **`tracks[]`** — per-input timing in the order they were submitted, useful for rendering subtitle overlays or speaker-highlight UI.

## Runtime

The `audioMix` step itself is cheap — typically a second or two of ffmpeg on a CPU worker. The wall-clock for the whole workflow is dominated by the TTS steps, which run in parallel on the Qwen workers and each take 60–120 s for a short line. Submit with `wait=0` and poll, the same as plain `textToSpeech`.

## Cost

`audioMix` itself is **free** — `Factors: []`, no fixed cost. The expensive work, generating each utterance, is already priced on the underlying `textToSpeech` steps under their existing per-character formula. See the [text-to-speech recipe](./text-to-speech#cost) for the TTS pricing.

The server enforces `maxDurationSeconds` (default 600) as a guardrail against runaway mixes; raise it on the input if you genuinely need longer output.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` validation error, "AudioMix track has no resolved url" | A `$ref` failed to resolve — the referenced step name doesn't match, or its output's path doesn't exist | Confirm each prior step has a unique `name` and that the `$ref.path` is `"output.audioBlob.url"` (case-sensitive). |
| Output clips/distorts when speakers overlap | Two or more tracks at `volumeDb: 0` summing to over full-scale | Either set `"normalize": true`, or attenuate the louder track with `volumeDb: -3` to `-6`. |
| Music bed too loud against speech | Bed `volumeDb` too high | Drop the bed to `-18` to `-24` dB; voice tracks at 0 dB then sit cleanly on top. |
| Cross-talk sounds abrupt | Interrupter starts with no fade | Add `fadeInMs: 60–120` on the interrupting track. |
| `failed`, "AudioMix output would be Xs, exceeding MaxDurationSeconds" | Resolved `startSeconds + duration` exceeds the cap on at least one track | Raise `maxDurationSeconds` on the input, or shorten the tracks. |
| Mix succeeded but `tracks[]` is empty | The middleware succeeded but the timing payload didn't propagate (rare) | The `audioBlob.duration` is still reliable; recompute per-track timing client-side from the inputs you sent. |

## Related

* [Text-to-speech](./text-to-speech) — single-utterance synthesis; the building block this recipe composes.
* [Transcription](./transcription) — the inverse, useful for caption tracks over a finished mix.
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — how `$ref` lets one step consume another's output.
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — for workflows with several long-running TTS steps.
* The [`AudioMixInput` and `AudioMixOutput` schemas](/orchestration/reference/operations/InvokeAudioMixStepTemplate) — full parameter reference.

---

---
url: /site/oauth.md
description: >-
  Authorize a third-party app to act on a Civitai user's behalf using OAuth 2.0
  with PKCE.
---

# OAuth

Civitai exposes an OAuth 2.0 server at `civitai.com/api/auth/oauth/*` so
third-party apps can act on a user's behalf — read their profile, manage
their content, or spend their buzz on AI generation — without ever seeing
their password or a long-lived API key.

## OAuth or API keys?

| Use… | When |
|---|---|
| [API keys](../guide/authentication) | You're scripting **your own** account (CI jobs, personal automation, server-to-server with your own buzz). |
| **OAuth** | You're building an app that **other users sign into**. Each user grants your app a scoped, revocable token. |

OAuth tokens are scoped (users see exactly what your app is asking for), can
be capped for buzz spend, and can be revoked from civitai.com at any time —
without rotating anything on your side.

## Supported flow

Civitai implements **Authorization Code with PKCE** ([RFC 7636](https://datatracker.ietf.org/doc/html/rfc7636))
with refresh tokens. PKCE (S256) is mandatory for every client, public or
confidential — there's no "skip PKCE because we have a secret" path.

```mermaid
sequenceDiagram
    participant U as User
    participant A as Your App
    participant C as civitai.com
    U->>A: Click "Sign in with Civitai"
    A->>U: Redirect to /authorize (code_challenge, state, scope)
    U->>C: Approves consent
    C->>A: Redirect to your callback (code, state)
    A->>C: POST /token (code, code_verifier, client_secret?)
    C->>A: { access_token, refresh_token, expires_in }
    A->>C: GET /userinfo (Bearer access_token)
```

`client_credentials` is also supported for confidential clients that need
to act on **their own owner account** (no end-user involved) — useful for
server-side maintenance jobs.

## Endpoint roster

| Endpoint | Purpose |
|---|---|
| `GET/POST /api/auth/oauth/authorize` | Start the flow; user signs in and consents. |
| `POST /api/auth/oauth/token` | Exchange `code` for tokens, or refresh. |
| `POST /api/auth/oauth/revoke` | Invalidate an access or refresh token. |
| `GET /api/auth/oauth/userinfo` | Identify the user behind an access token. |

See [Endpoints](./endpoints) for the request/response shape of each.

## Tokens at a glance

* **Format** — opaque, prefixed `civitai_…`; SHA-256 hashed at rest.
* **Access token** — 1 hour TTL. Send as `Authorization: Bearer <token>`.
* **Refresh token** — 30 day TTL. Exchange for a new access token via `POST /token`
  with `grant_type=refresh_token`.
* **Authorization code** — 10 minute TTL. Single-use.

Treat tokens as bearer credentials: anyone with the string can act as the
user, within the granted scopes and budget.

## Next steps

* [Register an app](./register-app) — create a `client_id` from your Civitai account.
* [Quickstart](./quickstart) — run through the flow end-to-end with curl.
* [Scopes](./scopes) — pick the right permissions for what your app needs.
* [Buzz limits](./buzz-limits) — what to expect when users cap your app's spend.

::: info Reference implementation
**[civitai/civitai-oauth-demo](https://github.com/civitai/civitai-oauth-demo)**
— minimal Node.js / Express integration covering authorize, exchange,
refresh, and revoke. Clone it, fill in your `client_id` / `client_secret`,
and run the full flow against your account in a few minutes.
:::

---

---
url: /site/oauth/endpoints.md
description: >-
  Request/response reference for the Civitai OAuth authorize, token, revoke, and
  userinfo endpoints.
---

# OAuth Endpoints

Base URL for every endpoint on this page: **`https://civitai.com`**.

## `GET/POST /api/auth/oauth/authorize`

Start the Authorization Code + PKCE flow. The caller is the **end user's
browser** — your app sends the user here, Civitai handles sign-in and
consent, then redirects them back to your `redirect_uri`.

### Request parameters

All parameters are URL-query on `GET` and form-body on the consent `POST`.

| Param | Required | Notes |
|---|:-:|---|
| `response_type` | ✓ | Must be `code`. |
| `client_id` | ✓ | From [app registration](./register-app). |
| `redirect_uri` | ✓ | Must exact-match one of the URIs you registered. |
| `scope` | ✓ | Decimal integer bitmask. See [Scopes](./scopes). |
| `state` | ✓ | Opaque value echoed back on the redirect. Use it to bind the response to a session and defeat CSRF. |
| `code_challenge` | ✓ | URL-safe base64 SHA-256 of your code verifier. |
| `code_challenge_method` | ✓ | Must be `S256` — `plain` is rejected. |
| `approved` | *consent POST* | `true` when the user clicks Allow on the consent screen. |
| `remember` | *consent POST* | `true` to persist consent so subsequent flows skip the screen. |
| `buzz_limit` | *consent POST* | JSON-encoded buzz-spend budget. See [Buzz limits](./buzz-limits). |

### Behavior

* If the user has no session, Civitai redirects them to `/login` with a
  return URL pointing back at `/authorize`. Your app's request continues
  once they sign in.
* If the user has already consented with the **same `scope`**, Civitai
  skips the consent page and issues a code immediately.
* If the requested scope is wider than the prior consent (or there's no
  prior consent), the user sees the consent page.

### Successful response

`302 Found` redirect to:

```
<redirect_uri>?code=<authorization_code>&state=<your_state>
```

The `code` is single-use, valid for 10 minutes.

### Error responses

| Status | `error` | Cause |
|---:|---|---|
| 400 | `invalid_request` | Missing required param, bad `redirect_uri`, missing or non-S256 PKCE, missing state. |
| 400 | `invalid_client` | `client_id` doesn't exist. |
| 400 | `invalid_scope` | Scope is not a non-negative integer ≤ `Full`. |
| 429 | *rate\_limit* | More than **10 requests / minute / user**. |

### CORS

Permissive with credentials (`Access-Control-Allow-Credentials: true`) on
the preflight, but this endpoint is meant for top-level browser navigation
— don't call it from `fetch()`.

***

## `POST /api/auth/oauth/token`

Exchange an authorization code for tokens, or refresh an existing pair,
or mint a client-owned token via `client_credentials`.

### Common request headers

```
Content-Type: application/x-www-form-urlencoded
```

### Grant: `authorization_code`

| Param | Required | Notes |
|---|:-:|---|
| `grant_type` | ✓ | `authorization_code` |
| `code` | ✓ | The code from the `/authorize` redirect. |
| `code_verifier` | ✓ | The PKCE verifier paired with the `code_challenge`. |
| `client_id` | ✓ | |
| `client_secret` | *confidential only* | Required for confidential clients; rejected for public clients. |
| `redirect_uri` | ✓ | Must match the value sent to `/authorize`. |

### Grant: `refresh_token`

| Param | Required | Notes |
|---|:-:|---|
| `grant_type` | ✓ | `refresh_token` |
| `refresh_token` | ✓ | The refresh token you currently hold. |
| `client_id` | ✓ | |
| `client_secret` | *confidential only* | |
| `scope` | – | Optional — narrow the granted scope; cannot widen. |

The old refresh token is invalidated when this call succeeds. Persist the
new one before discarding the old one.

### Grant: `client_credentials`

Issues a token bound to the **client's owner account**, with no end user
involved. Confidential clients only.

| Param | Required | Notes |
|---|:-:|---|
| `grant_type` | ✓ | `client_credentials` |
| `client_id` | ✓ | |
| `client_secret` | ✓ | |
| `scope` | – | Defaults to the client's `allowedScopes`. |

### Successful response

```json
{
  "access_token": "civitai_…",
  "token_type": "Bearer",
  "expires_in": 3600,
  "refresh_token": "civitai_…",
  "scope": "114689"
}
```

`refresh_token` is omitted for `client_credentials`.

### Error responses

[RFC 6749 §5.2](https://datatracker.ietf.org/doc/html/rfc6749#section-5.2) error envelope:

```json
{ "error": "invalid_grant", "error_description": "…" }
```

| `error` | Meaning |
|---|---|
| `invalid_grant` | Code is unknown, already used, expired, or PKCE verifier didn't match. Refresh token unknown or expired. |
| `invalid_client` | `client_id` unknown, or confidential client supplied wrong `client_secret`. |
| `invalid_request` | Missing required param. |
| `unsupported_grant_type` | `grant_type` not in the list above. |
| `invalid_scope` | Requested scope exceeds the client's `allowedScopes` or the original consent. |

**Rate limit:** 20 requests / minute / `client_id` → `429`.

**CORS:** permissive — this endpoint is designed to be called from third-party origins.

***

## `POST /api/auth/oauth/revoke`

Invalidate an access or refresh token ([RFC 7009](https://datatracker.ietf.org/doc/html/rfc7009)).

### Request

```
Content-Type: application/x-www-form-urlencoded
```

| Param | Required | Notes |
|---|:-:|---|
| `token` | ✓ | The token string to revoke. |
| `token_type_hint` | – | `access_token` or `refresh_token` — Civitai tries both regardless, the hint is an optimization. |
| `client_id` | *public via session* | Required for the client-credentials authentication path. |
| `client_secret` | *confidential clients* | Required if you're authenticating as a confidential client. |

### Authentication

The caller must prove authority to revoke the token via **one** of:

* A Civitai session cookie (browser context — user revoking their own token).
* `client_id` + `client_secret` on a **confidential** client.

Public clients with no session can't call `/revoke` — that's fine, just
drop the tokens locally. Revoking a token whose owner doesn't match the
caller is silently ignored, per RFC 7009.

### Response

Always `200 {}` regardless of whether the token existed or matched the
caller. **Don't** treat the response as confirmation that the token was
real or that you owned it.

Revoking a **refresh token** also invalidates every access token it minted
for the same `client_id` / user pair.

**Rate limit:** 20 requests / minute / IP.

***

## `GET /api/auth/oauth/userinfo`

Identify the user behind an access token.

### Request

```
Authorization: Bearer civitai_…
```

No body. Token must include the `UserRead` scope.

### Response

```json
{
  "sub": "12345",
  "id": 12345,
  "username": "ada",
  "image": "https://image.civitai.com/…"
}
```

`sub` is the string form of `id` for compatibility with OIDC consumers.

### Error responses

| Status | `error` | Cause |
|---:|---|---|
| 401 | `invalid_token` | Missing, malformed, or expired bearer token. |
| 403 | `insufficient_scope` | Token doesn't include `UserRead`. |

**CORS:** permissive — call from any origin.

For richer user info (email, links, stats, etc.) use the
[`GET /api/v1/me`](../reference/users) endpoint with the same bearer token.

---

---
url: /site/oauth/quickstart.md
description: End-to-end Authorization Code + PKCE walkthrough with curl.
---

# OAuth Quickstart

This page walks the full Authorization Code + PKCE flow using nothing but
curl. By the end you'll have an access token and a refresh token for a
Civitai user.

Before you start, [register an app](./register-app) so you have a
`client_id` (and a `client_secret` if you marked it confidential), then
export them:

```bash
export CIVITAI_CLIENT_ID="…"
export CIVITAI_CLIENT_SECRET="…"   # confidential clients only
export REDIRECT_URI="https://your-app.example.com/oauth/callback"
```

## 1. Generate a PKCE verifier and challenge

```bash
# 43-128 chars, URL-safe base64
VERIFIER=$(openssl rand -base64 64 | tr -d '+/=\n' | cut -c1-64)
CHALLENGE=$(printf '%s' "$VERIFIER" | openssl dgst -sha256 -binary | openssl base64 | tr '+/' '-_' | tr -d '=\n')
STATE=$(openssl rand -hex 16)
echo "$VERIFIER $CHALLENGE $STATE"
```

Keep `$VERIFIER` in your session store keyed by `$STATE` — you'll need it
when you exchange the code.

## 2. Send the user to /authorize

Pick a scope (decimal bitmask — see [Scopes](./scopes)). The example below
asks for \`UserRead | AIServicesRead | AIServicesWrite | BuzzRead = 1 + 16384

* 32768 + 65536 = 114689\`:

```bash
echo "https://civitai.com/api/auth/oauth/authorize?$(cat <<EOF | tr -d '\n'
response_type=code
&client_id=$CIVITAI_CLIENT_ID
&redirect_uri=$REDIRECT_URI
&scope=98305
&state=$STATE
&code_challenge=$CHALLENGE
&code_challenge_method=S256
EOF
)"
```

Open that URL in a browser. Civitai will sign the user in if they aren't
already, show the consent screen with the requested scopes, and (if the
user approves) redirect them to your `redirect_uri`.

## 3. Receive the callback

The user lands on:

```
https://your-app.example.com/oauth/callback?code=…&state=…
```

**Validate `state`** matches what you stored in step 1. If it doesn't,
reject the response — it's a CSRF attempt or a stale flow. Then look up
the verifier you stashed for that `$STATE`.

## 4. Exchange the code for tokens

```bash
curl -X POST https://civitai.com/api/auth/oauth/token \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=authorization_code" \
  -d "code=$CODE" \
  -d "code_verifier=$VERIFIER" \
  -d "client_id=$CIVITAI_CLIENT_ID" \
  -d "client_secret=$CIVITAI_CLIENT_SECRET" \
  -d "redirect_uri=$REDIRECT_URI"
```

Response:

```json
{
  "access_token": "civitai_…",
  "token_type": "Bearer",
  "expires_in": 3600,
  "refresh_token": "civitai_…",
  "scope": "114689"
}
```

Store both tokens server-side. Never ship them to the browser.

## 5. Call the API

```bash
curl https://civitai.com/api/auth/oauth/userinfo \
  -H "Authorization: Bearer $ACCESS_TOKEN"
```

```json
{ "sub": "12345", "id": 12345, "username": "ada", "image": "https://…" }
```

The same bearer header works for every Civitai endpoint that accepts tokens
— browse [the site reference](../reference/) for what's available.

## 6. Refresh before the access token expires

Access tokens live 1 hour. Swap the refresh token for a fresh pair any time
before then:

```bash
curl -X POST https://civitai.com/api/auth/oauth/token \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=refresh_token" \
  -d "refresh_token=$REFRESH_TOKEN" \
  -d "client_id=$CIVITAI_CLIENT_ID" \
  -d "client_secret=$CIVITAI_CLIENT_SECRET"
```

You get a new `access_token` + `refresh_token` pair. The old refresh token
is invalidated — use the new one going forward.

## 7. Revoke when the user signs out

```bash
curl -X POST https://civitai.com/api/auth/oauth/revoke \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "token=$REFRESH_TOKEN" \
  -d "token_type_hint=refresh_token" \
  -d "client_id=$CIVITAI_CLIENT_ID" \
  -d "client_secret=$CIVITAI_CLIENT_SECRET"
```

Revoking a refresh token also revokes the access tokens it minted. The
endpoint always returns `200 {}` regardless of whether the token existed
(per [RFC 7009](https://datatracker.ietf.org/doc/html/rfc7009)) — don't
treat the response as confirmation.

::: warning Public clients
If your client is **public** (no `client_secret`) and the user revokes
consent from civitai.com, you can't call `/revoke` to clean up — revoke
requires authentication. That's fine: the user already cut you off. Just
discard the tokens locally.
:::

## Full working code

[civitai/civitai-oauth-demo](https://github.com/civitai/civitai-oauth-demo)
is a complete Node.js / Express reference implementation of every step
above — authorize, exchange, refresh, revoke. Clone it, plug in your
credentials, and you have a working OAuth integration to study or fork.

---

---
url: /site/oauth/scopes.md
description: >-
  The full scope bitmask used by Civitai OAuth tokens, plus the four presets
  exposed in the app-registration UI.
---

# OAuth Scopes

Civitai OAuth scopes are bitwise flags. To request multiple scopes, OR them
together and pass the **decimal integer** as the `scope` parameter on the
`/authorize` URL.

```text
scope = UserRead | AIServicesRead | AIServicesWrite | BuzzRead
      = 1 | 16384 | 32768 | 65536
      = 114689
```

## Scope reference

| Bit | Value | Scope | What it grants |
|---:|---:|---|---|
| 0 | 1 | `UserRead` | Read profile & settings |
| 1 | 2 | `UserWrite` | Update profile & settings |
| 2 | 4 | `ModelsRead` | Browse & download models |
| 3 | 8 | `ModelsWrite` | Upload & edit models |
| 4 | 16 | `ModelsDelete` | Delete models |
| 5 | 32 | `MediaRead` | View images, videos & posts |
| 6 | 64 | `MediaWrite` | Upload media & create posts |
| 7 | 128 | `MediaDelete` | Delete media & posts |
| 8 | 256 | `ArticlesRead` | Read articles |
| 9 | 512 | `ArticlesWrite` | Create & edit articles |
| 10 | 1 024 | `ArticlesDelete` | Delete articles |
| 11 | 2 048 | `BountiesRead` | View bounties |
| 12 | 4 096 | `BountiesWrite` | Create & manage bounties **(buzz spend)** |
| 13 | 8 192 | `BountiesDelete` | Delete bounties |
| 14 | 16 384 | `AIServicesRead` | View generation & training history |
| 15 | 32 768 | `AIServicesWrite` | Generate, train & scan **(buzz spend)** |
| 16 | 65 536 | `BuzzRead` | View buzz balance & history |
| 17 | 131 072 | `CollectionsRead` | View collections |
| 18 | 262 144 | `CollectionsWrite` | Manage collections |
| 19 | 524 288 | `SocialWrite` | Follow, react, comment & review |
| 20 | 1 048 576 | `SocialTip` | *Reserved — see below* |
| 21 | 2 097 152 | `NotificationsRead` | Read notifications |
| 22 | 4 194 304 | `NotificationsWrite` | Manage notification preferences |
| 23 | 8 388 608 | `VaultRead` | View vault |
| 24 | 16 777 216 | `VaultWrite` | Manage vault |
| — | 33 554 431 | `Full` | All scopes |

## Buzz-spend scopes

Two scopes carry an implicit buzz-spend authorization — granting them
lets your app draw from the user's buzz balance:

* **`AIServicesWrite`** — every generation/training/scan request the app
  makes is billed to the consenting user. This is the only scope subject
  to the per-app [buzz limit](./buzz-limits) cap users can set at consent.
* **`BountiesWrite`** — bounty creation costs buzz at post time. Spend is
  gated by the user's overall balance only; per-app caps don't apply.

(`SocialTip` would be a third — see the reserved note below.)

Pair either with **`BuzzRead`** if you want to surface the user's
remaining balance in your UI before a spend.

::: warning SocialTip is currently reserved
The `SocialTip` bit is defined in the scope enum but every server endpoint
that requires it (tipping, donation goals, event tipping) is gated by
`blockApiKeys: true`, which denies all API-key and OAuth callers
regardless of scope. Granting `SocialTip` today is a no-op. The bit stays
reserved (and locked at 1<<20) so we don't reshuffle the bitmask when
tipping is unblocked for OAuth in the future.
:::

## Presets

The app-registration UI exposes four convenience presets. You can also
target them from the `scope` URL parameter directly:

| Preset | Decimal | Scopes |
|---|---:|---|
| **Read Only** | 10 701 093 | All `*Read` scopes |
| **Creator** | 11 492 205 | Read Only + Models / Media / Articles / Bounties / Collections Write + SocialWrite |
| **AI Services** | 114 688 | `AIServicesRead` | `AIServicesWrite` | `BuzzRead` |
| **Full Access** | 33 554 431 | Every defined scope |

Use a preset's number on `/authorize` and the consent screen will still show
the user every flag underneath — there's no shortcut around per-scope consent.

## Asking for less than you registered

The `allowedScopes` you set during [registration](./register-app) is a
ceiling, not a floor. You can ask for any subset of those bits on any
individual `/authorize` call — useful when one user only needs read access
but another wants to spend buzz, and you want a single registered app for
both.

If you request a scope bit your app isn't registered for, the user is
shown the consent screen anyway but Civitai will trim the token to your
allowed scopes when it issues it. Read the `scope` value back from the
token-endpoint response — it's authoritative.

## Checking scopes at the API boundary

Pass the access token as `Authorization: Bearer <token>` on any Civitai
endpoint. The endpoint returns `403 insufficient_scope` if the token's
scope is missing the bit it requires:

```json
{
  "error": "insufficient_scope",
  "error_description": "Token does not have UserRead scope"
}
```

That's the signal to re-run the flow with a wider scope (or, more often,
to display "this action needs additional permissions" in your UI).

---

---
url: /orchestration/recipes/openai.md
---

# OpenAI image generation

The orchestrator routes OpenAI image requests to OpenAI's hosted APIs via the `imageGen` step. Five models, each with its own behaviour and quality tier:

| `model` | Operations | Notes |
|---------|------------|-------|
| `gpt-image-2` | `createImage` / `editImage` | **Default** — latest GPT-Image model. Arbitrary `width`/`height` (not fixed presets), optional `maskImage` for regional edits. No `background` control. |
| `gpt-image-1.5` | `createImage` / `editImage` | Previous flagship. Fixed `size` enum, 4 images max, quality + background controls. |
| `gpt-image-1` | `createImage` / `editImage` | Older GPT-Image. Up to 10 images per call. Supports `background: "transparent"`. |
| `dall-e-3` | `createImage` only | Stand-alone `natural` vs `vivid` style control, `standard` / `hd` quality, up to 1792 px. |
| `dall-e-2` | `createImage` / `editImage` | Legacy. Only supports square outputs (256² / 512² / 1024²) and 1000-char prompts. Use only for compatibility reasons. |

**Default choice for new integrations**: `model: "gpt-image-2"`. It's OpenAI's latest flagship with flexible output dimensions and mask-based editing. Fall back to `gpt-image-1.5` if you need `background` control or prefer the fixed `size` enum; `gpt-image-1` for transparent backgrounds or `quantity > 4`; `dall-e-3` for style-controlled vivid output; avoid `dall-e-2` unless you specifically need it.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For `editImage`: one or more source image URLs, data URLs, or Base64 strings
* For `gpt-image-2` mask-based edits: a mask image whose fully transparent pixels (alpha = 0) indicate the region to edit

## gpt-image-2 (default)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "gpt-image-2",
      "operation": "createImage",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "width": 1024,
      "height": 1024,
      "quantity": 1,
      "quality": "high"
    }
  }]
}
```

### Parameters

| Field | Default | Allowed | Notes |
|-------|---------|---------|-------|
| `prompt` | — ✅ | ≤ 32 000 chars | Natural-language works best. |
| `width` | `1024` | 256–3840, multiple of 16 | Explicit width in pixels. |
| `height` | `1024` | 256–3840, multiple of 16 | Explicit height in pixels. |
| `quantity` | `1` | `1`–`4` | |
| `quality` | `high` | `low` / `medium` / `high` | Drives pricing (see [Cost](#cost) below). |
| `outputFormat` | `jpeg` | `jpeg` / `png` / `webp` | Inherited from the `imageGen` step. |

### Editing (`editImage`)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "gpt-image-2",
      "operation": "editImage",
      "prompt": "Make it a winter scene with snow falling",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

In edit mode `width` / `height` are **optional**. When both are omitted, the output size is inferred from the input images (`image_size: "auto"` is sent to the model). To force explicit output dimensions, set both fields.

To restrict the edit to a specific region, pass `maskImage` — a URL, data URL, or Base64 string where fully transparent pixels (alpha = 0) mark the area to modify:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "gpt-image-2",
      "operation": "editImage",
      "prompt": "Replace the background with a tropical beach",
      "images": ["https://image.civitai.com/.../source.jpeg"],
      "maskImage": "https://image.civitai.com/.../mask.png"
    }
  }]
}
```

Only the first image in `images[]` is masked; additional reference images are ignored by the mask.

## gpt-image-1.5

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "gpt-image-1.5",
      "operation": "createImage",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "size": "1024x1024",
      "quantity": 1,
      "quality": "high"
    }
  }]
}
```

### Parameters

| Field | Default | Allowed | Notes |
|-------|---------|---------|-------|
| `prompt` | — ✅ | ≤ 32 000 chars | Natural-language works best. |
| `size` | `1024x1024` | `1024x1024` / `1536x1024` / `1024x1536` | Exact pixel dimensions as a string, not width/height. |
| `quantity` | `1` | `1`–`4` | Lower cap than `gpt-image-1` (which allows up to 10). |
| `background` | `auto` | `auto` / `transparent` / `opaque` | Transparent backgrounds require PNG-compatible output. |
| `quality` | `high` | `low` / `medium` / `high` | 1.5 doesn't expose `auto`; pick one explicitly. |

### Editing (`editImage`)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "gpt-image-1.5",
      "operation": "editImage",
      "prompt": "Make it a winter scene with snow falling",
      "size": "1024x1024",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Pass one or more source images in `images[]`. The Edit shape also accepts a `mask` (string URL/DataURL/Base64) if you want to inpaint a specific region.

## gpt-image-1 (previous flagship, transparent backgrounds)

Same shape as `gpt-image-1.5` but older weights. Reach for it when you need:

* `quantity > 4` (up to 10)
* `background: "transparent"` (1.5 also supports it but older clients may already be wired for 1)
* `quality: "auto"`

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "gpt-image-1",
      "operation": "createImage",
      "prompt": "A stylized logo mark for a coffee brand, simple vector illustration",
      "size": "1024x1024",
      "quantity": 1,
      "background": "transparent",
      "quality": "high"
    }
  }]
}
```

`size`, `background`, `images[]`, and `mask` work exactly like 1.5; the only difference is `quantity` goes up to 10 and `quality` accepts `auto` as a fifth option.

## dall-e-3 (style-controlled, `natural` vs `vivid`)

`dall-e-3` is create-only and exposes a style dimension the GPT-Image models don't:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "dall-e-3",
      "operation": "createImage",
      "prompt": "A majestic fantasy landscape with floating islands",
      "size": "1024x1024",
      "style": "vivid",
      "quality": "hd"
    }
  }]
}
```

| Field | Default | Allowed | Notes |
|-------|---------|---------|-------|
| `prompt` | — ✅ | ≤ 4 000 chars | DALL·E 3 silently rewrites short/vague prompts — write verbose directives for control. |
| `size` | — ✅ | `1024x1024` / `1792x1024` / `1024x1792` | Required field on DALL·E 3 (unlike GPT-Image, which has a default). |
| `style` | `vivid` | `natural` / `vivid` | `vivid` pushes hyper-real colours; `natural` stays closer to the prompt. |
| `quality` | `auto` | `auto` / `standard` / `hd` | All quality tiers cost the same Buzz (300 flat per image); `hd` increases runtime only. |

DALL·E 3 doesn't support `editImage` or multiple samples — `quantity` isn't on the schema.

## dall-e-2 (legacy)

Only include for legacy compatibility. Prompt ≤ 1000 chars, square outputs only, and much lower output quality than newer models:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "openai",
      "model": "dall-e-2",
      "operation": "createImage",
      "prompt": "A vintage postcard illustration of a mountain town",
      "size": "512x512",
      "quantity": 1
    }
  }]
}
```

| Field | Allowed | Notes |
|-------|---------|-------|
| `size` | `256x256` / `512x512` / `1024x1024` | Square only. |
| `prompt` | ≤ 1000 chars | Much tighter than newer models. |
| `quantity` | `1`–`10` | |

Supports `editImage` with `image` (a single source string, not an array), but honestly — use GPT-Image 1 or 1.5 unless you have a reason not to.

## Reading the result

All models emit the standard `imageGen` output:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.png" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

OpenAI's API queue is the dominant factor — Civitai routes your request straight through. Typical wall times:

| Model | Per-image wall time | `wait` recommendation |
|-------|---------------------|-----------------------|
| `dall-e-2` | 3–8 s | `wait=30` fine |
| `dall-e-3` (standard) | 10–20 s | `wait=60` fine |
| `dall-e-3` (hd) | 15–40 s | `wait=60` usually fine |
| `gpt-image-1` / `1.5` | 10–30 s per image | `wait=60` fine for `quantity: 1`; fall back to `wait=0` for batches |
| `gpt-image-2` | 15–45 s per image (larger dims take longer) | `wait=60` fine for `quantity: 1`; `wait=0` + poll for batches or 4K |

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

### gpt-image-2 (size-aware)

Unlike the earlier GPT-Image models, `gpt-image-2` prices scale with output dimensions as well as quality. The orchestrator finds the cheapest canonical tier whose dimensions **cover** your requested `width`/`height` (comparing rotation-aware), then bills at that tier:

| Dimensions  | `low` | `medium` | `high` |
|-------------|-------|----------|--------|
| 1024 × 768  | 13    | 52       | 195    |
| 1024 × 1024 | 13    | 78       | 286    |
| 1024 × 1536 | 13    | 65       | 221    |
| 1920 × 1080 | 13    | 52       | 208    |
| 2560 × 1440 | 13    | 78       | 299    |
| 3840 × 2160 | 26    | 143      | 533    |

All values are Buzz per image. Final cost is `tier × quantity`, plus any priority / output-format surcharges applied by the `imageGen` step. Requests above 3840 × 2160 clamp to that row; requests smaller than 1024 × 768 floor to the cheapest row.

In edit mode with `width`/`height` omitted (`image_size: "auto"`), we estimate cost from the first input image's dimensions.

### Other models (flat per-quality)

```
total = base × quantity
```

| Model | `quality` | Base (Buzz per image) |
|-------|-----------|-----------------------|
| `gpt-image-1.5` | `low` | **25** |
| `gpt-image-1.5` | `medium` | **100** |
| `gpt-image-1.5` | `high` (default) | **375** |
| `gpt-image-1` | `low` | 25 |
| `gpt-image-1` | `medium` | 100 |
| `gpt-image-1` | `high` | 375 |
| `gpt-image-1` | `auto` | 300 |
| `dall-e-3` / `dall-e-2` | *(any)* | 300 |

Examples:

* `gpt-image-2` `high`, `1024×1024`, `quantity: 1` → **~286 Buzz**
* `gpt-image-2` `high`, `1920×1080`, `quantity: 2` → **~416 Buzz**
* `gpt-image-2` `medium`, `3840×2160`, `quantity: 1` → **~143 Buzz**
* `gpt-image-1.5` `high`, `quantity: 1` → **~375 Buzz**
* `dall-e-3` `hd`, `quantity: 1` → **~300 Buzz**

For the older models `size`, `background`, and `style` don't change the Buzz price — only `quality` (GPT-Image) and `quantity` do. For `gpt-image-2`, `width` / `height` **also** affect the price via the tier lookup above.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "size must be one of" | Sent `1536x1024` to DALL·E 3 (it wants `1792x1024`) or `1792x1024` to GPT-Image | Match the size enum for the model you're using — the tables above list each model's allowed set. |
| `400` with "style is not a valid property" | Sent `style` outside DALL·E 3 | Only DALL·E 3 exposes `style`. |
| `400` with "quantity must be ≤ 4" on `gpt-image-1.5` | Using 1.5 ceilings with 1's quantity expectations | Drop to `quantity: 4` or use `gpt-image-1` (up to 10). |
| `400` with "prompt too long" on `dall-e-2` | DALL·E 2's 1000-char prompt cap | Trim the prompt or move to a newer model. |
| Output is unexpectedly stylised on DALL·E 3 | `style: "vivid"` default | Set `style: "natural"` for closer-to-prompt output. |
| Output is PNG when you wanted JPEG | Transparent backgrounds force PNG | Set `background: "opaque"` if you want JPEG, or leave `outputFormat` unset and accept whatever OpenAI returns. |
| Request timed out (`wait` expired) | Large `quantity` or `hd` on DALL·E 3 during busy periods | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | OpenAI's content filter or Civitai moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Google image generation](./google) — alternative commercial tier with Imagen 4 and Nano Banana
* [Gemini image generation](./gemini) — Google's Gemini 2.5 Flash Image direct-API route
* [Flux 2](./flux2) / [Flux 1](./flux1) / [Qwen](./qwen) — open-weights alternatives on Civitai-hosted workers
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `OpenAIGpt1CreateImageInput`, `OpenAIGpt1EditImageInput`, `OpenAIGpt15CreateImageInput`, `OpenAIGpt15EditImageInput`, `OpenAIDallE3CreateImageGenInput`, `OpenAIDallE2CreateImageGenInput`, `OpenAIDallE2EditImageInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface

---

---
url: /site/guide/pagination.md
description: Page-based vs. cursor-based pagination on the Civitai site API.
---

# Pagination

Most list endpoints (`/models`, `/images`, `/creators`, `/tags`) support both
page-based and cursor-based pagination. Choose cursor-based for anything
beyond a handful of pages.

## Page-based

```http
GET /api/v1/models?page=1&limit=100
```

* `page` is **1-indexed**.
* `limit` caps at 100 for `/models`, 200 for `/images`, 200 for `/creators`/`/tags`.
* `page * limit` may not exceed **1000**. Beyond that the API returns
  `429 Too Many Requests` with the message
  `"You've requested too many pages, please use cursors instead"`.

Response metadata when paging:

```json
{
  "items": [ ... ],
  "metadata": {
    "totalItems": 84916,
    "currentPage": 1,
    "pageSize": 100,
    "totalPages": 850,
    "nextPage": "https://civitai.com/api/v1/creators?page=2&limit=100"
  }
}
```

Not every endpoint reports `totalItems` / `totalPages` — some report `0` when
an exact count isn't cheap to compute (notably `/tags`). Use `nextPage`, not
the counts, to drive "load more" UIs.

## Cursor-based

```http
GET /api/v1/models?limit=100&cursor=75363|932023|257749
```

* Cursors are **opaque strings** — don't try to parse them. Treat them as tokens.
* Keep calling with `nextCursor` until it's missing from the response.
* Cursor-based pagination is required when using `?query=<text>` (Meilisearch
  full-text search). Combining `page` with `query` returns
  `400 Bad Request`.

Cursor metadata:

```json
{
  "items": [ ... ],
  "metadata": {
    "nextCursor": "75363|932023|257749",
    "nextPage": "https://civitai.com/api/v1/models?limit=100&cursor=..."
  }
}
```

When `nextCursor` is absent or null, you've reached the end.

## When to prefer cursors

* Deep paging (more than ~10 pages at `limit=100`).
* Any query using `?query=...` for full-text search.
* Iterating through the whole catalog — cursors stay correct even as new
  content is added between calls, while page-based traversal can skip or
  duplicate results.

Keep `limit` as large as the endpoint allows (usually 100 or 200) to minimize
round trips.

---

---
url: /site/reference/permissions.md
description: >-
  Check whether a caller (or another user) is allowed to generate from a given
  resource.
---

# Permissions

Some Civitai resources are gated — most commonly by the **early-access** window
on a model version. This endpoint lets you check, in bulk, whether a user is
allowed to use a given resource for generation before you submit it to the
[Orchestration API](/orchestration/).

## Check generation permission

```
GET /api/v1/permissions/check
```

**Auth:** Public. The check runs against the user identified by the `userId`
query param (anonymous when omitted). Bearer tokens are **not** used to scope
this endpoint — pass `userId` explicitly when you need to check permissions
for a specific user.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `entityIds` | comma-separated integers | — | The IDs to check. Required. |
| `entityType` | `ModelVersion` | `ModelVersion` | The kind of entity. Currently only model versions are supported. |
| `permission` | `Generate` | `Generate` | Which permission to check. Currently only `Generate` is supported. |
| `userId` | integer | — | Run the check on behalf of this user instead of the token's owner. Useful for partner integrations that broker requests for many users. |

### Response

A flat object mapping each `entityId` to a boolean.

```json
{
  "2514310": true,
  "2402203": false
}
```

`true` means the resource can be used to generate; `false` means it's gated
and the user does not currently have access (e.g. early-access window is
active and they haven't paid for it, or it's marked `Private` and they're not
the owner).

When `entityIds` is empty, the response is an empty array (`[]`).

### Errors

| Status | Body | Cause |
|--------|------|-------|
| `400` | `{"error":"Could not parse provided model versions array."}` | Missing or malformed `entityIds`. |
| `400` | `{"error":"Invalid permission"}` | `permission` not recognised. |
| `500` | `{"message":"An unexpected error occurred", "error": ...}` | Internal failure. |

### Example

```bash
# Anonymous check across two versions
curl "https://civitai.com/api/v1/permissions/check?entityIds=2514310,2402203"

# Check on behalf of a specific user
curl "https://civitai.com/api/v1/permissions/check?entityIds=2514310&userId=12345"
```

::: tip
Combine this with [`GET /model-versions/{id}`](./model-versions): the model
version response already includes `earlyAccessEndsAt` and `earlyAccessConfig`,
which tell you *why* a resource is gated. Use this endpoint when you only
need a yes/no for a specific user.
:::

---

---
url: /orchestration/recipes/prompt-enhancement.md
---

# Prompt enhancement

The `promptEnhancement` step type takes a user-written prompt and rewrites it for a specific image/video generation ecosystem, returning a list of detected issues, actionable recommendations, and the rewritten prompt(s). It runs an LLM under the hood and finishes in well under the synchronous-request window, which makes it one of the few recipes you can call with `wait=60` and get the full result back inline.

Common uses:

* Transform short user prompts into detailed, ecosystem-specific prompts before calling `imageGen` / `videoGen`
* Surface issues (vague subject, missing lighting, thin negative prompt) to the end user as inline suggestions
* Enforce constraints ("keep it under 77 tokens", "no adjectives", "English only") via the `instruction` field

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* An ecosystem slug matching your downstream generation step (see [Ecosystems](#ecosystems))

## The simplest request

Inline, synchronous — safe to use `wait=60` since the LLM call typically finishes in a few seconds:

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/promptEnhancement?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "ecosystem": "flux1",
  "prompt": "A photo of a cat sitting on a windowsill"
}
```

The per-recipe endpoint unwraps the response and returns the step output directly (a `PromptEnhancementOutput` with `issues`, `recommendations`, `enhancedPrompt`, and optionally `enhancedNegativePrompt`).

## Via the generic workflow endpoint

Use this path for webhooks, tags, or to chain into another step like `imageGen`:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "promptEnhancement",
    "input": {
      "ecosystem": "sdxl",
      "prompt": "anime character with sword, cool background"
    }
  }]
}
```

## Input fields

See the [`PromptEnhancementInput` schema](/orchestration/reference/operations/InvokePromptEnhancementStepTemplate) for the full definition.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `ecosystem` | ✅ | — | Target ecosystem slug, e.g. `"flux1"`, `"sdxl"`, `"sd1"`, `"ltx2"`. Drives the enhancement style (booru-style tags for SD1/SDXL, natural-language descriptions for Flux, motion cues for video). |
| `prompt` | ✅ | — | The user's original prompt. Non-empty string. |
| `negativePrompt` | | — | Optional. If present, the response also includes `enhancedNegativePrompt`. Most useful on SD1/SDXL where negative prompts carry weight; often unnecessary on Flux/video. |
| `temperature` | | `0.7` | LLM temperature, `0.0`–`1.0`. Lower for conservative rewrites, higher for more creative variation. |
| `instruction` | | — | Optional free-text directive shaping the rewrite (`"keep it under 20 words"`, `"add cinematic lighting cues"`, `"translate to English first"`). Short, specific directives work best. |

## Ecosystems

Most generative ecosystems exposed by the orchestrator have a registered prompt-enhancement template — pass the same slug you'd use on the downstream `imageGen` / `videoGen` step (e.g. `sd1`, `sdxl`, `flux1`, `ltx2`). The template drives the rewrite style — Booru-style tag soup for SD1/SDXL, natural-language sentences for Flux, motion-aware prompts for video ecosystems. An unknown slug falls through to a generic LLM rewrite without a 400, so output quality on unsupported slugs is best-effort rather than a hard error.

## Using `instruction`

`instruction` is a free-text directive the enhancer treats as the primary constraint. Use it to force length limits, style edicts, or translation passes:

```json
{
  "ecosystem": "flux1",
  "prompt": "a dog playing frisbee",
  "instruction": "Keep it under 20 words and emphasize motion."
}
```

A live run against prod with that input returned:

> *"A golden retriever leaping dynamically mid-air to catch a flying frisbee, sharp motion-blurred action shot."* (19 words)

## Enhancing both prompt and negative prompt

When `negativePrompt` is present, the response includes an `enhancedNegativePrompt` tuned to the same ecosystem. Particularly useful for SD1/SDXL where negatives meaningfully steer generations:

```json
{
  "ecosystem": "sdxl",
  "prompt": "anime character with sword",
  "negativePrompt": "ugly, blurry",
  "temperature": 0.4
}
```

Response (from a live run):

```json
{
  "issues": [
    { "description": "Prompt is extremely vague …", "severity": "error" },
    { "description": "Missing quality boosters …",  "severity": "warning" },
    { "description": "Negative prompt is too minimal …", "severity": "warning" }
  ],
  "recommendations": [
    "Prepend quality tags like 'masterpiece, best quality, highly detailed' …",
    "Specify character details (gender, expression, attire), pose, and scene …",
    "Enhance negative prompt with targeted tags for anatomy errors and artifacts."
  ],
  "enhancedPrompt": "masterpiece, best quality, highly detailed, sharp focus, anime style, solo character, heroic pose, wielding large ornate sword, dynamic action stance, …",
  "enhancedNegativePrompt": "low quality, blurry, ugly, deformed, mutated hands, extra limbs, extra fingers, poorly drawn face, bad anatomy, watermark, text, signature, lowres, jpeg artifacts"
}
```

## Chaining: enhance, then generate

The highest-leverage pattern is to drop `promptEnhancement` in front of an `imageGen` / `videoGen` step and feed the enhanced prompt through a `$ref`:

```json
{
  "steps": [
    {
      "$type": "promptEnhancement",
      "name": "enhance",
      "input": {
        "ecosystem": "flux1",
        "prompt": "a cat astronaut in space"
      }
    },
    {
      "$type": "imageGen",
      "name": "hero",
      "input": {
        "engine": "flux2",
        "model": "klein",
        "operation": "createImage",
        "modelVersion": "4b",
        "prompt": { "$ref": "enhance", "path": "output.enhancedPrompt" },
        "width": 1024,
        "height": 1024
      }
    }
  ]
}
```

The `{ "$ref": "enhance", "path": "output.enhancedPrompt" }` reference creates a dependency — `hero` doesn't start until `enhance` succeeds, and its `prompt` field is filled in with the rewritten text at runtime. See [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism).

Match the `ecosystem` on `promptEnhancement` to the downstream model family — `flux1` for Flux, `sdxl` for SDXL, etc. Mismatches yield rewrites that technically parse but lose their edge (e.g. Booru tags in a Flux call, or natural-language sentences in an SDXL call).

## Reading the result

A successful `promptEnhancement` step emits analysis + rewrites:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "promptEnhancement",
    "status": "succeeded",
    "output": {
      "issues": [
        {
          "description": "Prompt is overly generic and vague, lacking details on the cat's appearance …",
          "severity": "warning"
        }
      ],
      "recommendations": [
        "Add descriptive details to the cat (e.g., breed, color, pose) …",
        "Specify lighting, such as 'warm sunlight streaming through the window' …"
      ],
      "enhancedPrompt": "A fluffy tabby cat sitting contentedly on a wooden windowsill …, photorealistic photo, warm golden sunlight streaming through the window …",
      "enhancedNegativePrompt": null
    }
  }]
}
```

Fields:

* **`issues[]`** — each entry has `description` plus `severity` (`"info"`, `"warning"`, or `"error"`). Good for surfacing in a UI as a bulleted list.
* **`recommendations[]`** — actionable plain-text suggestions. Each is a complete sentence.
* **`enhancedPrompt`** — the rewritten prompt, ready to feed back into `imageGen` / `videoGen`.
* **`enhancedNegativePrompt`** — populated only when the input included a `negativePrompt`.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Flat-rate:

```
total = 1 Buzz per call
```

That's it — prompt length, ecosystem, `temperature`, `instruction`, and whether you pass a `negativePrompt` all leave the price unchanged. Prompt enhancement is the cheapest step Civitai exposes; drop it in front of expensive generation steps without worrying about the overhead.

## Runtime

LLM-backed, usually 2–10 s per call including queue wait. Safe to use `wait=60` (or even `wait=30`) and get the result inline. Cost is a flat base of 1.0 per call regardless of prompt length — unlike generative steps, there's no per-pixel or per-second multiplier.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "prompt" validation error | Empty or missing `prompt` | Always include a non-empty `prompt`; `minLength: 1` is enforced. |
| `400` with "temperature out of range" | Value outside `0.0`–`1.0` | Clamp client-side; leave unset to accept `0.7`. |
| Output doesn't match the ecosystem style (e.g. Flux-style prose on an SD1 request) | Unknown `ecosystem` slug falling through to a generic template | Use the same slug as the downstream generation step, and verify the spelling. |
| `instruction` not respected | Instruction too long, contradictory, or buried in prose | Keep it to one short directive. "Under 20 words" beats a paragraph. |
| `enhancedNegativePrompt` is `null` | No `negativePrompt` was sent | Include a `negativePrompt` in the input if you need one back. |
| Request timed out (`wait` expired) | Rare — the LLM call shouldn't take >10 s on a warm node | Resubmit with `wait=0` and poll, or retry once. |

## Related

* [`InvokePromptEnhancementStepTemplate`](/orchestration/reference/operations/InvokePromptEnhancementStepTemplate) — the per-recipe endpoint
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/promptEnhancement/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining
* [Flux 2](./flux2) / [Flux 1](./flux1) image generation, [WAN video generation](./wan), [LTX2 video generation](./ltx2) — downstream generation recipes that take the rewritten prompt
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — feeding `output.enhancedPrompt` through `$ref`

---

---
url: /orchestration/guide/getting-started.md
---

# Quick start

This page walks you through submitting your first workflow, inspecting the result, and polling for completion.

## Prerequisites

* A Civitai API token (Bearer).
* The orchestrator base URL: `https://orchestration.civitai.com`.

You pass the token as `Authorization: Bearer <token>` on every request.

## 1. Submit a workflow

A workflow is a list of **steps**. Each step has a `$type` (the step type) and an `input`. Submit the workflow to [`POST /v2/consumer/workflows`](/orchestration/reference/operations/SubmitWorkflow).

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?whatif=false&wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [
    {
      "$type": "imageGen",
      "input": {
        "engine": "flux",
        "prompt": "A cat astronaut floating through neon space",
        "width": 1024,
        "height": 1024
      }
    }
  ]
}
```

Key query parameters:

| Param | Default | Purpose |
|-------|---------|---------|
| `whatif` | `false` | If `true`, validates the workflow and returns the resolved plan without executing. Great for CI smoke tests. |
| `wait` | `0` | Seconds to block waiting for completion inline. `0` returns immediately with a pending workflow. Capped by the **100-second request timeout** — if the workflow hasn't finished by then, the response returns with `status: "processing"` and you continue via polling or webhooks. |

## 2. Read the response

The response is the **Workflow** object — the same object you get back from [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) while polling. Key fields:

```json
{
  "id": "wf_01HXYZ...",
  "status": "succeeded",
  "steps": [
    {
      "name": "0",
      "$type": "imageGen",
      "status": "succeeded",
      "output": {
        "blobs": [
          { "id": "blob_...", "url": "https://.../signed-url" }
        ]
      }
    }
  ]
}
```

Statuses progress through: `pending` → `processing` → (`succeeded` | `failed` | `canceled`).

## 3. Poll if you didn't wait inline

Server requests time out at **100 seconds**, so any workflow longer than that — most video jobs, large batches, training — will return still `processing`. Poll [`GET /v2/consumer/workflows/{workflowId}`](/orchestration/reference/operations/GetWorkflow) until it reaches a terminal state:

```http
GET https://orchestration.civitai.com/v2/consumer/workflows/wf_01HXYZ...
Authorization: Bearer <your-token>
```

A reasonable loop is: 2 s, 5 s, 10 s, 15 s, then 30 s thereafter. Most image jobs finish in under 30 s; video jobs can take several minutes.

Production integrations should use webhooks instead of polling — see [Results & webhooks](./results-and-webhooks).

## 4. Consume outputs

Step outputs are typically **blobs**. Each blob comes back with a signed `url` you can fetch directly. Blob URLs expire; re-fetch the workflow (or call [`GetBlob`](/orchestration/reference/operations/GetBlob)) to refresh.

## What's next

* Try a real recipe end-to-end: [WAN video generation](/orchestration/recipes/wan)
* Browse all recipes: [Recipes](/orchestration/recipes/)
* Go deep on the request/response shapes: [API reference](/orchestration/reference/)

---

---
url: /orchestration/recipes/qwen.md
---

# Qwen image generation

Qwen is Alibaba's image-generation family. The orchestrator exposes two invocation paths, covering different versions of the model family:

| `engine` | Model | Best for | Notes |
|----------|-------|----------|-------|
| `sdcpp` (ecosystem `qwen`) | `20b` | **Default** — Qwen-Image 20B on Civitai workers | `createImage` / `createVariant` / `editImage`. Version picker (`latest`, `2509`, `2512`, `2511`). LoRA support. |
| `fal` | `qwen2` | When you need FAL-hosted inference — including the **Pro** tier | `createImage` / `proCreateImage` / `editImage` / `proEditImage`. `imageSize` enum instead of width/height. No LoRA support. |

**Default choice for new integrations**: `engine: "sdcpp"`, `ecosystem: "qwen"`, `model: "20b"`. Reach for the `fal` path when you specifically want FAL's hosting or the Pro tier.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For `editImage` / `createVariant`: one or more source image URLs, data URLs, or Base64 strings

## sdcpp — Qwen-Image 20B (default path)

### Text-to-image (`createImage`)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "qwen",
      "model": "20b",
      "operation": "createImage",
      "version": "latest",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting, highly detailed",
      "width": 1024,
      "height": 1024,
      "cfgScale": 2.5,
      "steps": 20
    }
  }]
}
```

Common sdcpp-qwen parameters:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `model` | — ✅ | `20b` | Currently only `20b` is exposed. |
| `version` | `latest` | `latest` / `2509` / `2512` (create + variant) / `2511` (edit) | Model release snapshot. `latest` follows whichever is current. |
| `prompt` | — ✅ | ≤ 10 000 chars | Natural-language works well on Qwen. |
| `negativePrompt` | *(none)* | ≤ 10 000 chars | Optional. |
| `width` / `height` | `1024` | `64`–`2048`, divisible by 8 | Qwen-Image 20B is trained at 1024². Well-behaved aspect ratios stay near that pixel count. On `editImage` / `createVariant`, width/height are inferred from the source image if omitted; you may still supply them explicitly. |
| `cfgScale` | `2.5` | `0`–`30` | Lower than most image models — `2`–`4` is the sweet spot. |
| `steps` | `20` | `1`–`150` | `20`–`30` typical. |
| `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | `simple` | enum | [`SdCppSchedule`](/orchestration/reference/). |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; `0.6`–`1.0` strengths typical. |
| `quantity` | `1` | `1`–`12` | Number of images per call. |
| `seed` | random | int64 | Pin for reproducibility. |

### Picking a `version`

| `version` | Available on | Notes |
|-----------|--------------|-------|
| `latest` | all ops | Follows whatever release Civitai is currently pinning to. Recommended unless you need reproducibility against a specific release. |
| `2509` | all ops | September 2025 release snapshot. |
| `2512` | create / variant | December 2025 release — latest generation-side release at the time of writing. |
| `2511` | edit | November 2025 release — edit-specific snapshot. |

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "qwen",
      "model": "20b",
      "operation": "createImage",
      "version": "2512",
      "prompt": "A majestic mountain landscape at sunrise, cinematic composition",
      "width": 1024,
      "height": 1024,
      "cfgScale": 2.5,
      "steps": 20
    }
  }]
}
```

Pin to a specific `version` when you need reproducible output or are comparing generations across a larger experiment; stick to `latest` for day-to-day use.

### With LoRAs

LoRAs are a map of AIR URN → strength:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "qwen",
      "model": "20b",
      "operation": "createImage",
      "prompt": "A detailed anime character in a magical forest, ethereal lighting",
      "width": 1024,
      "height": 1024,
      "cfgScale": 2.5,
      "steps": 20,
      "loras": {
        "urn:air:qwen:lora:civitai:123456@789012": 0.8
      }
    }
  }]
}
```

Only Qwen-tagged LoRAs work on the `qwen` ecosystem. Browse the [Civitai Qwen LoRA catalog](https://civitai.com/models?baseModels=Qwen+Image) for AIR URNs.

### Image-to-image (`createVariant`)

Pass a single source image and a prompt; the model re-imagines it. `strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. Width and height are inferred from the source.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "qwen",
      "model": "20b",
      "operation": "createVariant",
      "prompt": "Turn it into a winter scene with snow falling",
      "image": "https://image.civitai.com/.../source.jpeg",
      "strength": 0.7
    }
  }]
}
```

Note `image` is a plain string URL (not a `{ url: ... }` wrapper), and the field is `strength` (default `0.7`).

### Edit image (`editImage`)

Pass up to **10 reference images** via `images[]` — Qwen-Image's edit variant accepts more sources than most edit pipelines:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "qwen",
      "model": "20b",
      "operation": "editImage",
      "prompt": "Add a rainbow in the sky",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Width/height are inferred from the source image(s) when omitted.

## fal — FAL-hosted Qwen2 (with Pro tier)

When you want FAL's hosted inference — including the commercial **Pro** tier via the `proCreateImage` and `proEditImage` operations — use `engine: "fal"`, `model: "qwen2"`:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "fal",
      "model": "qwen2",
      "operation": "createImage",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "imageSize": "square_hd"
    }
  }]
}
```

### FAL-specific parameters

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `prompt` | — ✅ | ≥ 1 char | Natural-language works well on Qwen. |
| `negativePrompt` | *(none)* | ≤ 500 chars | Much tighter limit than sdcpp's 10 000. |
| `imageSize` | `square_hd` | `square_hd` / `square` / `portrait_4_3` / `portrait_16_9` / `landscape_4_3` / `landscape_16_9` | **Enum, not width/height.** FAL doesn't accept arbitrary dimensions on Qwen2. |
| `quantity` | `1` | `1`–`10` | Slightly lower ceiling than sdcpp's 12. |
| `enablePromptExpansion` | `true` | boolean | Model-side prompt expansion — Qwen rewrites your prompt before generation. On by default. |
| `enableSafetyChecker` | `false` | boolean | FAL's safety filter. Off by default. |
| `seed` | random | int32 | Pin for reproducibility. |

FAL Qwen2 does **not** support LoRAs or `uCache` — use the sdcpp path when you need either.

### Pro tier (`proCreateImage` / `proEditImage`)

FAL ships a Pro tier with the same input shape but routes to a higher-quality backing model. Swap the `operation` to `proCreateImage` for text-to-image or `proEditImage` for editing:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "fal",
      "model": "qwen2",
      "operation": "proCreateImage",
      "prompt": "An epic fantasy battle scene with dragons, cinematic lighting, intricate details",
      "imageSize": "landscape_16_9"
    }
  }]
}
```

Pro costs more and is slower, but delivers stronger prompt adherence and finer detail. Use for hero shots where quality matters more than throughput.

### Edit image (FAL)

FAL's edit operation accepts **1–3** reference images (vs sdcpp's 10):

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "fal",
      "model": "qwen2",
      "operation": "editImage",
      "prompt": "Make it daytime",
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Swap to `proEditImage` for the Pro tier variant.

## Reading the result

Both engines emit the standard `imageGen` output — an `images[]` array, one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

| Path | Typical wall time per 1024×1024 image | `wait` recommendation |
|------|---------------------------------------|-----------------------|
| sdcpp Qwen-Image 20B | 15–35 s | `wait=60` usually fine for `quantity: 1` |
| FAL Qwen2 (create / edit) | 10–30 s depending on FAL queue | `wait=60` usually fine |
| FAL Qwen2 Pro (`proCreateImage` / `proEditImage`) | 20–60 s depending on queue | `wait=60` sometimes; fall back to `wait=0` on busy periods |

Large `quantity` or atypical aspect ratios push toward the 100-second request timeout — submit with `wait=0` and poll.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

**sdcpp Qwen-Image 20B** — per-pixel + per-step scaling, with `editImage` carrying a flat higher base:

```
createImage / createVariant:
  total = 30 × (width × height / 1328²) × (steps / 20) × quantity

editImage:
  total = 40 × (width × height / 1024²) × (steps / 20) × (1 + (numImages − 1) × 0.1) × quantity
```

| Shape | Buzz |
|-------|------|
| `createImage`, 1024²/`steps: 20`/`quantity: 1` | **~18** |
| `createImage`, 1328²/`steps: 20`/`quantity: 1` | ~30 |
| `createImage`, 1024²/`steps: 20`/`quantity: 4` | ~72 |
| `editImage`, 1 ref, 1024²/`steps: 20` | **40** |
| `editImage`, 3 refs, 1024²/`steps: 20` | **48** |

**FAL Qwen2** (`engine: "fal"`) — commercial tier routed through FAL. Base cost is fixed per operation, per image. `createImage` is cheapest; `proCreateImage` / `proEditImage` cost meaningfully more. Use `whatif=true` to confirm pricing for your shape — FAL pricing shifts independently of the sdcpp path.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "ecosystem must be qwen" | Typo | Lowercase `"qwen"` — not `"Qwen"` or `"qwen2"` (that's the fal `model`, not the sdcpp ecosystem). |
| `400` with "version must be one of" | Picked a version that isn't valid for that operation | Edit supports `latest`/`2509`/`2511`; create + variant support `latest`/`2509`/`2512`. |
| `400` with unexpected "width/height" error on edit/variant | Dimensions conflict with source resolution | Omit `width`/`height`; they auto-populate from the source. |
| `400` with "imageSize must be one of" on fal | Arbitrary dimensions on FAL path | FAL Qwen2 uses the enum — pick `square_hd`, `landscape_16_9`, etc. Use the sdcpp path for arbitrary dimensions. |
| `400` with "images maxItems" | More than 10 source images on sdcpp `editImage`, or more than 3 on fal `editImage` | Trim the array. |
| LoRA has no effect on FAL | FAL Qwen2 doesn't support LoRAs | Switch to the sdcpp path. |
| Output ignores the prompt | `cfgScale` too low (Qwen wants ~2.5, far below SD1/SDXL's 7) or `enablePromptExpansion` rewriting your carefully-tuned text | Bump `cfgScale` toward 4 on sdcpp; set `enablePromptExpansion: false` on fal. |
| Request timed out (`wait` expired) | Large `quantity`, Pro tier on busy queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt or input image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Flux 2](./flux2) / [Flux 1](./flux1) image generation — alternative open-weights families
* [SDXL](./sdxl) / [SD1](./sd1) image generation — classic Stable Diffusion ecosystems
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `Qwen20b<Operation>Input` (sdcpp) and `Qwen2<Operation>FalImageGenInput` / `Qwen2Pro<Operation>FalImageGenInput` (fal) schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes.md
---

# Recipes

Task-oriented, end-to-end examples. Each recipe walks through a real workflow: what to send, what you get back, common parameter tweaks, and troubleshooting.

## Video

* [WAN video generation](./wan) — all WAN versions (2.1–2.7) across FAL, Comfy, and Civitai, with text-to-video, image-to-video, reference-to-video, and edit-video operations
* [LTX2 video generation](./ltx2) — Lightricks LTX2 and LTX2.3 on Comfy, including the new videoToVideo (style transfer) and audioToVideo (talking-head) operations
* [Kling video generation](./kling) — Kuaishou Kling (v1/v1.5/v1.6/v2/v2.5-turbo with camera control) and Kling V3 (5 operations, multi-prompt, audio, video-to-video)
* [Vidu video generation](./vidu) — Vidu 2.0 (flat 600 Buzz, anime style, first-last-frame) and Vidu Q3 (per-second pricing, 4 resolution tiers, turbo mode, native audio)
* [Veo 3 video generation](./veo3) — Google Veo 3.0/3.1 in standard / fast / lite tiers; operation inferred from image count; optional synchronized audio track
* [Grok video generation](./grok-video) — xAI Grok-Imagine-Video via FAL; text-to-video, image-to-video, and edit-video with 480p/720p output
* [HunyuanVideo generation](./hunyuan) — Tencent HunyuanVideo on Comfy workers; text-to-video with LoRA support; compute-intensive, always use `wait=0`
* [Video upscaling](./video-upscaler) — FlashVSR, 2–4× with a 2560 px output cap
* [Video frame interpolation](./video-interpolation) — VFIMamba, 2× or 3× frame-count, smooths generated or low-FPS footage

## Image

* [Flux 2 image generation](./flux2) — Flux.2 Klein (default, cheap + capable, 4b/9b, supports createVariant) plus Dev / Flex / Pro / Max for higher-fidelity and commercial tiers
* [Flux 1 image generation](./flux1) — Flux.1 through sdcpp (default, minimal required input) or Comfy, plus the BFL-hosted `flux1-kontext` editing tier
* [Z-Image generation](./zimage) — lightweight text-to-image on sdcpp; `turbo` (default, distilled, extremely fast + cheap) or `base` when you need more fidelity
* [Qwen image generation](./qwen) — Qwen-Image 20B on sdcpp (default) or FAL-hosted Qwen2 with a Pro tier; supports createImage + createVariant + editImage
* [Anima image generation](./anima) — anime-tuned sdcpp ecosystem with built-in diffuser, LoRA support, createImage only
* [ERNIE image generation](./ernie) — Baidu ERNIE Image on Comfy; `ernie` standard + `turbo` distilled variant, built-in diffuser, LoRA support, createImage only
* [SDXL image generation](./sdxl) — Stable Diffusion XL at 1024² native via sdcpp (default) or Comfy, with createImage + createVariant
* [SD1 image generation](./sd1) — classic Stable Diffusion 1.5 at 512² via sdcpp (default) or Comfy, with createImage + createVariant
* [OpenAI image generation](./openai) — GPT-Image 1 / 1.5 and DALL·E 2 / 3 via OpenAI's hosted API
* [Google image generation](./google) — Imagen 4 and Nano Banana Pro / 2 via Vertex AI, with editing + web-search grounding
* [Gemini image generation](./gemini) — Gemini 2.5 Flash Image (same product as Nano Banana) via the direct Gemini API
* [Seedream image generation](./seedream) — ByteDance Seedream v3 / v4 / v4.5 / v5.0-lite with native up-to-4096 output + editing
* [Grok image generation](./grok) — xAI Grok with wide aspect-ratio menu (21 options) + editing
* [WAN image generation](./wan-image) — WAN v2.2 / v2.2-5b / v2.5 / v2.7 via FAL (image counterpart to the WAN video recipe)
* [Image upscaling](./image-upscaler) — ESRGAN-family upscalers, chain after `imageGen` or use standalone

## Audio

* [Transcription](./transcription) — Qwen3-ASR, multilingual, word-level timestamps for captioning
* [Text-to-speech](./text-to-speech) — built-in speakers with optional style prompt, or voice cloning from a reference clip
* [ACE-Step music generation](./ace-step-audio) — full songs from a style description + structured lyrics, 2B turbo default with optional 4B XL overrides; audio-only MP3 or MP4 with a still cover image

## Language models

* [Chat completion](./chat-completion) — any OpenRouter model or Civitai AIR model, vision inputs, tool use, streaming, image generation via `modalities: ["image"]`; OpenAI-compatible `/v1/chat/completions` endpoint or workflow step

## Utilities

* [Prompt enhancement](./prompt-enhancement) — LLM rewrites a user prompt for a target ecosystem (Flux / SDXL / SD1 / LTX2), returns issues + recommendations + enhanced prompt
* [Image conversion](./convert-image) — format conversion (JPEG / PNG / WebP / GIF), resize, and region blur; flat 1 Buzz

## Training

Train a LoRA on your own dataset using AI Toolkit. All training runs are async — submit with `wait=0` and follow up via polling or a webhook. Cost is per-epoch in Buzz; use `whatif=true` to preview the exact charge.

* [SDXL & SD1 LoRA training](./training-sdxl-sd1) — classic Stable Diffusion ecosystems (50 Buzz/epoch each); cheapest pick for first fine-tunes
* [Flux 1 LoRA training](./training-flux1) — Flux.1 Dev or Schnell (200 Buzz/epoch); higher quality, fixed BFL base checkpoints
* [Flux 2 Klein LoRA training](./training-flux2-klein) — Flux 2 Klein 4b / 9b (50 / 100 Buzz/epoch), including image-edit training mode with control reference images
* [Wan video LoRA training](./training-wan) — preview ecosystem for Wan 2.1 / 2.2 video LoRAs (12 Buzz/epoch)
* [LTX2 video LoRA training](./training-ltx2) — Lightricks LTX2 and LTX 2.3 video LoRAs (LTX 2.3 flat 200 Buzz/epoch)
* [Chroma / ERNIE / Qwen / Z-Image LoRA training](./training-other-image) — five smaller image ecosystems consolidated into one page; each section is independently runnable

::: tip Copy-paste runnable
All recipes target `https://orchestration.civitai.com` and use `<your-token>` as a placeholder for your Bearer token. Drop them into curl, HTTPie, VS Code's REST Client, or any tool that speaks HTTP.
:::

---

---
url: /site/reference.md
description: Civitai site API reference — per-resource endpoint documentation.
---

# Reference

All endpoints below live under `https://civitai.com/api/v1/`.

| Resource | Endpoints |
|----------|-----------|
| [Models](./models) | `GET /models`, `GET /models/{id}` |
| [Model versions](./model-versions) | `GET /model-versions/{id}`, `GET /model-versions/by-hash/{hash}`, `POST /model-versions/by-hash`, `POST /model-versions/by-hash/ids`, `GET /model-versions/mini/{id}` |
| [Images](./images) | `GET /images` |
| [Creators](./creators) | `GET /creators` |
| [Tags](./tags) | `GET /tags` |
| [Users](./users) | `GET /me`, `GET /users` |
| [Permissions](./permissions) | `GET /permissions/check` |
| [Vault](./vault) | `GET /vault/get`, `GET /vault/all`, `GET /vault/check-vault`, `POST /vault/toggle-version` |
| [Enums](./enums) | `GET /enums` |

## Conventions used on this page

* **Base URL:** `https://civitai.com/api/v1`
* **Content type:** All responses are `application/json; charset=utf-8`.
* **Auth class** (shown on each endpoint):
  * *Public* — no token required.
  * *Mixed* — works without a token, but some parameters or response fields require one.
  * *Authenticated* — 401 without a valid token.
* **Caching:** Public endpoints set `Cache-Control: public, s-maxage=300, stale-while-revalidate=150`; authenticated calls skip the cache.
* **Region gating:** Responses may be filtered to SFW-only content regardless of the `nsfw` param when the request comes from a restricted region or Civitai's "green" domain. This is silent — you just see fewer results.

See the [Guide](../guide/) for cross-cutting topics (authentication,
pagination, errors, AIR identifiers).

---

---
url: /site/oauth/register-app.md
description: Create an OAuth client from your Civitai account settings.
---

# Registering an OAuth app

OAuth client registration is self-service. Sign in to civitai.com, open
[your account settings](https://civitai.com/user/account), and find the
**OAuth Apps** card. From there you can create new apps, edit their
permissions, and rotate secrets.

![Edit OAuth Application modal showing an example app filled in — name, description, two redirect URIs, and a permissions grid with Profile Read, AI Services Read+Write, and Buzz Read checked.](/images/oauth/edit-oauth-app.png)

The screenshot shows a realistic baseline for any app that lets users
spend their own buzz on AI generation: read their profile, read and write
AI Services, and read their buzz balance.

## Fields

### App name

Shown to users on the consent screen ("**\<App name>** wants to access your
Civitai account"). Pick something users will recognize from your product
surface.

### Description

One sentence shown directly below the name on consent. Tell users what your
app does and where it runs — they're about to grant it access to their
account.

### Redirect URIs

One URI per line. **Exact match** — Civitai will reject any `redirect_uri`
parameter that isn't in this list, character for character. Common patterns:

* One entry per environment (`https://staging.example.com/oauth/callback`,
  `https://app.example.com/oauth/callback`).
* `http://localhost:3000/oauth/callback` for local development. (HTTPS is
  not required for `localhost`; it is required for everything else.)
* A separate callback path if you also support "Sign in with Civitai"
  alongside other providers (e.g. `https://app.example.com` plus
  `https://app.example.com/signin-civitai`).

Changes take effect immediately — no waiting period.

### Permission preset

Drop-down with four bundles for the common cases:

| Preset | Good for |
|---|---|
| **Read Only** | Browsers, dashboards, anything that doesn't write or spend buzz. |
| **Creator** | Apps that upload models / media / articles on the user's behalf. |
| **AI Services** | Generation-focused clients — `AIServicesRead | AIServicesWrite | BuzzRead`. Pair with `UserRead` for "who is this user?" calls. |
| **Full Access** | Power-user tooling. Avoid for general distribution — users will balk. |

Pick **Custom** to mix and match from the permissions grid below the preset.
See [Scopes](./scopes) for the bit-by-bit breakdown.

### Permissions grid

One row per resource category with Read / Write / Delete columns. Civitai
honors the principle of least privilege at consent time — users see the
exact set you request, so asking for less makes your app easier to approve.

::: warning
Don't pre-check Delete unless your app genuinely needs to delete on the
user's behalf. Most apps that "edit" content really just need Write.
:::

### Confidential vs public client

When you create the app, you choose whether it's confidential:

* **Confidential** — your code runs on a server you control. Civitai issues
  you a `client_secret` you must keep private. Required for the
  `client_credentials` grant and for calling `/revoke`.
* **Public** — your code runs on a device (browser, mobile app, desktop)
  you can't trust to keep a secret. No `client_secret` is issued. PKCE alone
  protects the flow.

Pick **confidential** by default if you have a backend; only choose **public**
when you genuinely can't store a secret.

## After you save

Civitai shows you the `client_id` (and `client_secret`, if confidential) on
the success screen. **The secret is shown once.** Copy it into your secret
store immediately — if you lose it, rotate it (see below).

You're ready to run through the [quickstart](./quickstart).

## Rotating the secret

Confidential apps have a **Rotate secret** action in the OAuth Apps card.
Rotating invalidates the old secret immediately, so deploy the new one to
your servers first, then rotate. Issued access and refresh tokens keep
working — only your app's ability to mint new ones or call `/revoke`
breaks until you update your config.

## Deleting an app

Deleting an app cascades:

* All access and refresh tokens issued for the app are invalidated.
* All user consents are removed.
* All audit-log entries are retained (deletion doesn't erase history).

Users can also delete their own consent for your app from their **Connected
Apps** card — same outcome on their tokens, no notification to you.

## Verification

The `isVerified` flag is set by Civitai staff for trusted apps and unlocks
nicer consent-screen treatment (verified badge, fewer warnings). Unverified
apps still work end-to-end — verification is purely a trust-signal layer
for the user.

If you ship a production OAuth integration on Civitai, reach out on the
[Civitai Discord](https://civitai.com/discord) to request verification
once you're ready.

---

---
url: /orchestration/guide/results-and-webhooks.md
---

# Results & Webhooks

You have two ways to learn that a workflow finished: **poll** [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) until it reaches a terminal state, or register **callbacks** (webhooks) on the workflow and let the orchestrator push events to you.

For anything longer than the [100-second request timeout](./getting-started#_3-poll-if-you-didn-t-wait-inline) — most video jobs, training, large batches — webhooks are strongly preferred over polling.

## Registering callbacks

Callbacks live on the workflow body you submit via [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow), alongside `steps`:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [
    {
      "$type": "videoGen",
      "input": { "engine": "wan", "version": "v2.6", "operation": "text-to-video",
                 "prompt": "...", "resolution": "1080p", "duration": 10 }
    }
  ],
  "callbacks": [
    {
      "url": "https://your-service.example.com/civitai-hooks",
      "type": ["workflow:succeeded", "workflow:failed"],
      "detailed": true
    }
  ]
}
```

Each entry in `callbacks` (see the [`WorkflowCallback` schema](/orchestration/reference/operations/SubmitWorkflow)) has:

| Field | Required | Notes |
|-------|----------|-------|
| `url` | ✅ | HTTPS endpoint that will receive POSTed events. |
| `type` | ✅ | Array of event types to subscribe to (see below). |
| `detailed` | | If `true`, the payload includes the full workflow / step output (blobs, timings). Defaults to `false` — you'd then call [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) to fetch details. |

You can register multiple callbacks per workflow (different URLs, different event filters).

## Event types

Event types use a `<scope>:<status>` format. Scopes fan out at decreasing granularity:

| Scope | Fires on | Typical use |
|-------|----------|-------------|
| `workflow:*` | Workflow-level transitions | What you usually want — one event when the whole workflow resolves. |
| `step:*` | Each step transition | Multi-step workflows where you want intermediate output. |
| `job:*` | Each internal job transition | Debugging / observability — usually too noisy for production. |

Statuses you can filter on: `unassigned`, `processing`, `succeeded`, `failed`, `expired`, `canceled`. Use `*` to receive every status for a scope.

Common subscriptions:

* **"Tell me when it's done, pass or fail"** → `["workflow:succeeded", "workflow:failed", "workflow:expired", "workflow:canceled"]`
* **"Everything about the workflow"** → `["workflow:*"]`
* **"Per-step progress in a multi-step pipeline"** → `["step:succeeded", "step:failed"]`

## Event payload

The orchestrator POSTs a JSON body to your `url`. For workflow-scoped events:

```json
{
  "$type": "workflow",
  "workflowId": "wf_01HXYZ...",
  "status": "succeeded",
  "timestamp": "2026-04-12T23:00:00Z",
  "details": {
    "createdAt": "2026-04-12T22:58:12Z",
    "startedAt": "2026-04-12T22:58:14Z",
    "completedAt": "2026-04-12T23:00:00Z",
    "steps": [
      { "name": "0", "status": "succeeded", "output": { "blobs": [ /* ... */ ] } }
    ]
  }
}
```

`details` is only present when the callback was registered with `detailed: true`. Without it, you get the transition notification (id + status + timestamp) and fetch the workflow yourself.

Step events use [`WorkflowStepEvent`](/orchestration/reference/operations/SubmitWorkflow) (`workflowId` + `name` + `status` + optional `details`), and job events use [`WorkflowStepJobEvent`](/orchestration/reference/operations/SubmitWorkflow) (adds `jobId`, `progress`, `reason`). Inspect `$type` to route them.

## Delivery semantics

* **In-order, serialized per workflow.** The orchestrator waits for each callback invocation to complete before sending the next one, so `processing` always arrives before `succeeded` for a given workflow / step.
* **Terminal states are terminal.** Once a workflow or step reaches `succeeded`, `failed`, `expired`, or `canceled`, it will not transition back to `processing` or any other state.
* **`processing` can repeat.** You may get multiple `processing` events for the same workflow or step — each one signals progress. If you need the latest details (e.g. partial output), call [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) in response.
* **Automatic retries on transient errors.** If your endpoint returns a non-`2xx` or times out, the orchestrator retries before advancing to the next event.

## Receiving endpoint checklist

* **Return `2xx` quickly.** Do the real work on a queue after acknowledging. Slow receivers delay subsequent events (delivery is serialized) and get retried.
* **Be idempotent on `processing`.** Retries and legitimately repeated `processing` events mean the same event can arrive more than once. Use `(workflowId, status, timestamp)` as your dedupe key, or treat `processing` as "latest progress, refetch if you care."
* **Accept only HTTPS URLs** — the orchestrator won't post to plain HTTP.

## Blobs in results

Output blobs come back with signed `url` fields inside `details.steps[].output.blobs[]`. Those URLs **expire** — refetch the workflow (or call [`GetBlob`](/orchestration/reference/operations/GetBlob)) to get a fresh signed URL; don't cache the URL long-term. If you need to keep the media, download the bytes and store them yourself.

## Polling fallback

If you can't expose a webhook endpoint (scripts, CLI tools, notebooks), poll [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow). Suggested cadence: 2 s, 5 s, 10 s, 15 s, then 30 s. Stop when `status` is one of `succeeded`, `failed`, `expired`, `canceled`.

## Ephemeral workflows

By default, a submitted workflow is retained for 30 days after it reaches a terminal state, so you can refetch it via [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) for as long as you might need the results.

Set `"ephemeral": true` on the submission body to opt out of that retention entirely — the workflow is never written to long-term storage. The only way to receive its results is a callback or a synchronous `wait`:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=120
Content-Type: application/json

{
  "ephemeral": true,
  "steps": [ /* ... */ ],
  "callbacks": [
    { "url": "https://your-service.example.com/civitai-hooks",
      "type": ["workflow:succeeded", "workflow:failed"],
      "detailed": true }
  ]
}
```

* **Validation.** The orchestrator rejects ephemeral submissions with neither a `callbacks` entry nor `wait > 0` (HTTP 400) — without one of those, you'd have no way to get the result.
* **Use `detailed: true` on callbacks** (or `wait` long enough for terminal status) — once the workflow finishes, [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) returns 404 and you can't go back for it.
* **In-flight polling still works.** While the workflow is running you can poll `GetWorkflow` as usual; only after it reaches a terminal state does the record disappear.

---

---
url: /orchestration/recipes/sd1.md
---

# SD1 image generation

SD1 is Stable Diffusion 1.x — the original open-weights family. Mature ecosystem, huge model catalog, native resolution **512×512**, prompt style is Booru-like tag-soup plus quality boosters (`masterpiece, best quality, …`). Two invocation paths on the orchestrator:

| `engine` | Best for | Notes |
|----------|----------|-------|
| `sdcpp` (ecosystem `sd1`) | **Default** — Stable Diffusion C++ on Civitai workers | `sampleMethod` + `schedule` sdcpp enums, textual-inversion embeddings, `uCache` mode. |
| `comfy` (ecosystem `sd1`) | When you specifically need ComfyUI sampler/scheduler enums or a Comfy-only feature | `sampler` + `scheduler` Comfy enums, `denoiseStrength` (vs sdcpp's `strength`) on variants. |

**Default choice for new integrations**: `engine: "sdcpp"`, `ecosystem: "sd1"`. Reach for `comfy` only when you specifically need a ComfyUI-exclusive sampler.

Both engines support `createImage` and `createVariant` (img2img). Neither supports `editImage` — use [Flux 1 Kontext](./flux1#flux1-kontext-managed-editing-tier) or [Flux 2 Klein](./flux2#klein-createvariant-img2img) if you need prompt-driven editing.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* An SD1 checkpoint AIR URN — browse the [Civitai SD1.5 catalog](https://civitai.com/models?baseModels=SD+1.5)
* For `createVariant`: a source image URL

## sdcpp (default path)

### Text-to-image

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sd1",
      "operation": "createImage",
      "model": "urn:air:sd1:checkpoint:civitai:4384@128713",
      "prompt": "masterpiece, best quality, 1girl, solo, portrait, looking at viewer, cinematic lighting",
      "negativePrompt": "worst quality, low quality, blurry, bad anatomy",
      "width": 512,
      "height": 512,
      "cfgScale": 7,
      "steps": 20,
      "clipSkip": -1
    }
  }]
}
```

Common sdcpp-sd1 parameters:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `model` | — ✅ | AIR URN | SD1 checkpoint. See the [catalog](https://civitai.com/models?baseModels=SD+1.5). |
| `prompt` | — ✅ | ≤ 10 000 chars | Booru-style tags work best. Lead with quality tags (`masterpiece, best quality, …`). |
| `negativePrompt` | *(none)* | ≤ 10 000 chars | Strongly recommended on SD1 — `worst quality, low quality, blurry, bad anatomy, bad hands` is a solid starting point. |
| `width` / `height` | `512` | `64`–`2048`, divisible by 16 | SD1's native resolution is 512×512. Aspect-ratio variants like 512×768 or 768×512 work; going much bigger often produces duplicated subjects. |
| `cfgScale` | `7` | `0`–`30` | `6`–`8` is the sweet spot for SD1. |
| `steps` | `20` | `1`–`150` | `20`–`30` typical; `30`+ rarely helps. |
| `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | `discrete` | enum | [`SdCppSchedule`](/orchestration/reference/). |
| `clipSkip` | `-1` | int | `-1` = model default. `2` is a common hand-tuned value on many SD1 checkpoints — try it if output feels stiff. |
| `vaeModel` | *(checkpoint VAE)* | AIR URN | Override baked-in VAE. Rarely needed. |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; `0.6`–`1.0` strengths typical. |
| `embeddings` | `[]` | array of AIR URNs | Textual inversions. Reference them in the prompt / negative prompt by their embedding name. |
| `quantity` | `1` | `1`–`12` | Number of images per call. |
| `seed` | random | int64 | Pin for reproducibility. |
| `uCache` | *(default)* | enum | [`SdCppUCacheMode`](/orchestration/reference/). Caching strategy; leave default unless you know you want otherwise. |

### With LoRAs

LoRAs are a map of AIR URN → strength. Style LoRAs usually want `0.6`–`1.0`; character / concept LoRAs often sit higher.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sd1",
      "operation": "createImage",
      "model": "urn:air:sd1:checkpoint:civitai:4384@128713",
      "prompt": "masterpiece, best quality, cyberpunk street at night, neon signs",
      "negativePrompt": "worst quality, low quality, blurry",
      "width": 512,
      "height": 768,
      "cfgScale": 7,
      "steps": 25,
      "loras": {
        "urn:air:sd1:lora:civitai:123456@789012": 0.8
      }
    }
  }]
}
```

### Image-to-image (`createVariant`)

Pass a source image + a new prompt; the model re-imagines it. `strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. `0.6`–`0.8` is the sweet spot.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sd1",
      "operation": "createVariant",
      "model": "urn:air:sd1:checkpoint:civitai:4384@128713",
      "prompt": "masterpiece, best quality, daytime with clear blue sky",
      "negativePrompt": "worst quality, low quality",
      "width": 512,
      "height": 512,
      "cfgScale": 7,
      "steps": 20,
      "image": "https://image.civitai.com/.../source.jpeg",
      "strength": 0.7
    }
  }]
}
```

Note `image` is a plain string URL (not a `{ url: ... }` wrapper), and the field is `strength` (not `denoiseStrength` like on Comfy).

## Comfy (alternative path)

Use `engine: "comfy"` when you specifically need a ComfyUI sampler/scheduler enum that sdcpp doesn't expose.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "comfy",
      "ecosystem": "sd1",
      "operation": "createImage",
      "model": "urn:air:sd1:checkpoint:civitai:4384@128713",
      "prompt": "masterpiece, best quality, 1girl, solo, portrait, looking at viewer",
      "negativePrompt": "worst quality, low quality, blurry, bad hands",
      "width": 512,
      "height": 512,
      "steps": 30,
      "cfgScale": 7,
      "sampler": "euler_ancestral",
      "scheduler": "normal",
      "clipSkip": 2
    }
  }]
}
```

Key differences from sdcpp:

| Field | sdcpp | comfy |
|-------|-------|-------|
| Sampler | `sampleMethod` ([`SdCppSampleMethod`](/orchestration/reference/)) | `sampler` ([`ComfySampler`](/orchestration/reference/)) |
| Schedule | `schedule` ([`SdCppSchedule`](/orchestration/reference/)) | `scheduler` ([`ComfyScheduler`](/orchestration/reference/)) |
| Img2img strength | `strength` (default `0.7`) | `denoiseStrength` (default `0.75`) |
| Default `steps` | `20` | `30` |
| Default `clipSkip` | `-1` (model default) | `2` |
| `embeddings` array | ✅ | — |
| `uCache` | ✅ | — |

Comfy also supports `createVariant` with `image` (plain string URL) + `denoiseStrength`; see the [`ComfySd1VariantImageGenInput` schema](/orchestration/reference/).

## Reading the result

Both engines emit the standard `imageGen` output — an `images[]` array, one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

SD1 at 512×512 is one of the fastest image-gen paths available — typical wall time 3–10 s per image (sdcpp) or 5–15 s (comfy, slightly heavier). `wait=60` works comfortably for `quantity ≤ 4`. Going beyond 768px on either axis compounds runtime quadratically; submit with `wait=0` and poll for large batches or dimensions.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Both engines use the same per-pixel / per-step shape — different reference values:

```
total = base × (width × height / referencePixels) × (steps / referenceSteps) × quantity
```

| Path | `base` | `referencePixels` | `referenceSteps` | Defaults → Buzz |
|------|--------|-------------------|------------------|-----------------|
| `sdcpp + sd1` | `4` | 512² | `20` | 512²/`steps: 20`/`q: 1` → **~4 Buzz** |
| `comfy + sd1` | `4` | 512² | `30` | 512²/`steps: 30`/`q: 1` → **~4 Buzz** |

Examples:

* 512² at `quantity: 4` → ~16 Buzz
* 768×512 at `steps: 25` → ~4 × 1.5 × 1.25 = **~7.5 Buzz** (sdcpp)
* 512² at `steps: 40` → **~8 Buzz** (sdcpp)

SD1 is the cheapest per-image option Civitai exposes — native 512² defaults keep the formula at 1.0 on each axis.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "model must match AIR pattern" | Passed a bare model ID or version slug | Use a full AIR URN: `urn:air:sd1:checkpoint:civitai:<modelId>@<versionId>`. |
| `400` with unknown property | Field not valid for this engine (e.g. `sampler` on sdcpp, `sampleMethod` on comfy, `denoiseStrength` on sdcpp) | Match the schema for your chosen engine — see the difference table above. |
| Output has duplicated subjects / Frankenstein anatomy | Dimensions too far from SD1's native 512×512 | Generate at 512 / 768 max; upscale with [`imageUpscaler`](./image-upscaler) if you need more resolution. |
| Output looks flat / low-contrast on SD1 | `clipSkip` at model default where the checkpoint expects `2` | Try `clipSkip: 2` — the convention for many SD1 community checkpoints. |
| Prompts feel ignored on SD1 | Missing quality boosters, or `cfgScale` too low | Lead the prompt with `masterpiece, best quality, …` tags; bump `cfgScale` toward `8`. |
| LoRA has no visible effect | Wrong AIR URN, model private / not published, or ecosystem mismatch | Verify the URN on the LoRA's Civitai page; only SD1-tagged LoRAs work on the `sd1` ecosystem. |
| Request timed out (`wait` expired) | Large `quantity`, non-native dimensions | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [SDXL image generation](./sdxl) — higher-resolution successor to SD1
* [Flux 2](./flux2) / [Flux 1](./flux1) image generation — newer open-weights families with stronger prompt adherence
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` (use `ecosystem: "sd1"` on the enhancer)
* Full parameter catalog: the `Sd1CreateImageGenInput` / `Sd1VariantImageGenInput` / `ComfySd1CreateImageGenInput` / `ComfySd1VariantImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes/training-sdxl-sd1.md
---

# SDXL & SD1 LoRA training

Train a Stable Diffusion LoRA on your own image dataset using AI Toolkit. This page covers the two classic SD ecosystems:

| `ecosystem` | Output LoRA usable with | Buzz / epoch | Native resolution |
|-------------|------------------------|--------------|-------------------|
| `sdxl` | [SDXL image generation](./sdxl) | 50 | 1024² |
| `sd1` | [SD1 image generation](./sd1) | 50 | 512² |

Both are the cheapest training ecosystems on the platform — good first pick when iterating on dataset choice or hyperparameters before stepping up to a more expensive model family. SDXL is the better default for new fine-tunes; SD1 is mostly useful when you specifically need a SD 1.5 LoRA (e.g. to deploy onto an existing SD 1.5 product).

## The request shape

Every training request is a single `training` step with an `engine` and `ecosystem` selector:

```json
{
  "$type": "training",
  "input": {
    "engine":    "ai-toolkit",
    "ecosystem": "sdxl"          // sdxl | sd1
  }
}
```

The input shape is shared with the other AI Toolkit ecosystems — see [Common parameters](#common-parameters) below. The fields documented here are the SD-family-specific ones (`model`, `minSnrGamma`, `triggerWord`, default text-encoder behavior).

::: tip Long-running step
Training takes minutes to hours depending on dataset size and epoch count. Always submit with `wait=0` and follow up with polling or a webhook — see [Results & webhooks](/orchestration/guide/results-and-webhooks). The `<RecipeRun>` widgets below preview cost via `whatif=true` so you can see the price without kicking off an actual training run.
:::

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A training-data zip uploaded to a Civitai-reachable URL. Three accepted forms:
  * A signed `https://civitai-delivery-worker-prod.*.r2.cloudflarestorage.com/...` URL
  * A Civitai R2 AIR (e.g. `urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/.../...zip`)
  * Any HTTPS URL that returns the zip without auth
* An accurate `count` of images in the zip — used to compute `numberOfRepeats` defaults and batch sizing

## SDXL

Stable Diffusion XL trains at 1024² and produces a LoRA usable with any SDXL checkpoint. The base checkpoint defaults to `urn:air:sdxl:checkpoint:civitai:101055@128078` (Juggernaut XL); override `model` to train on top of a different SDXL checkpoint.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "sdxl",
      "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
      "epochs": 10,
      "numberOfRepeats": 14,
      "lr": 0.0005,
      "textEncoderLr": 5e-5,
      "trainTextEncoder": true,
      "lrScheduler": "cosine",
      "optimizerType": "adafactor",
      "networkDim": 32,
      "networkAlpha": 32,
      "noiseOffset": 0.1,
      "minSnrGamma": 5,
      "triggerWord": "cat",
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2783465TrainingData.koBd.zip",
        "count": 15
      },
      "samples": {
        "prompts": [
          "catz no humans, cat, whiskers, animal focus, looking at viewer, animal, solo, yellow eyes",
          "catz no humans, candle, pokemon (creature), blurry, animal focus, solo, food, bird, standing",
          "catz no humans, food, noodles, bowl, cat, green eyes, tongue, food focus, ramen, chopsticks"
        ]
      }
    }
  }]
}
```

SDXL-specific fields:

| Field | Default | Range / values | Notes |
|-------|---------|----------------|-------|
| `model` | `urn:air:sdxl:checkpoint:civitai:101055@128078` | Any SDXL `checkpoint` AIR | The base checkpoint your LoRA is trained on top of. |
| `minSnrGamma` | `5` | `0`–`20` | Min-SNR-γ stabilization. `5` is a good default; lower values can speed up convergence on simple subjects. |
| `triggerWord` | *(none)* | string | Optional prompt token that activates the trained LoRA. Recommended for character / style LoRAs. |
| `trainTextEncoder` | `true` | bool | SDXL benefits noticeably from text-encoder training. Disable to halve memory cost at the price of prompt-following quality. |
| `textEncoderLr` | `5e-5` | `0`–`1` | Only used when `trainTextEncoder: true`. |
| `optimizerType` | `adafactor` | see [enum](#common-parameters) | `adafactor` is the SDXL house default; `adamw8bit` works too if you have memory headroom. |

## SD1

Stable Diffusion 1.5 trains at 512² and produces a LoRA usable with any SD1.5 checkpoint. Base checkpoint defaults to `urn:air:sd1:checkpoint:civitai:127227@139180`. Same shape as SDXL with two differences: `optimizerType` defaults to `adamw8bit` (not `adafactor`), and `noiseOffset` defaults to `0` (not `0.1`).

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training"],
  "steps": [{
    "$type": "training",
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "sd1",
      "model": "urn:air:sd1:checkpoint:civitai:127227@139180",
      "epochs": 100,
      "lr": 0.0005,
      "textEncoderLr": 5e-5,
      "trainTextEncoder": true,
      "lrScheduler": "cosine",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 16,
      "noiseOffset": 0.1,
      "minSnrGamma": 5,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2707938TrainingData.Ac22.zip",
        "count": 8
      },
      "samples": {
        "prompts": [
          "no humans, cat, whiskers, animal focus, looking at viewer, animal, solo, yellow eyes",
          "no humans, food, tiger, cake, animal focus, food focus, whiskers, black background",
          "no humans, fruit, food, bug, black background, wings, food focus, orange (fruit), antennae"
        ]
      }
    }
  }]
}
```

SD1-specific fields are a subset of the SDXL list — same `model` / `minSnrGamma` / `triggerWord` / `trainTextEncoder` / `textEncoderLr`, with these defaults:

| Field | SD1 default | SDXL default |
|-------|-------------|--------------|
| `optimizerType` | `adamw8bit` | `adafactor` |
| `noiseOffset` | `0` | `0.1` |
| `numberOfRepeats` (auto) | `ceil(400 / count)` | `ceil(200 / count)` |

## Common parameters {#common-parameters}

These apply to every AI Toolkit training input regardless of ecosystem. Defaults shown are the post-`ApplyDefaults` values for SDXL; some ecosystems override individual defaults (the SD1 / SDXL differences above are the main examples).

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `engine` | ✅ | — | Always `ai-toolkit` for these recipes. |
| `ecosystem` | ✅ | — | `sdxl` or `sd1` for this page. |
| `epochs` | | `5` | `1`–`20`. One pass through the dataset = one epoch. Billed per epoch (see [Cost](#cost)). |
| `numberOfRepeats` | | auto | `1`–`5000`. Per-image repeats inside one epoch. Auto-computed from dataset size when omitted. |
| `lr` | | `0.0001` | `0`–`1`. UNet learning rate. `0.0005` is typical for character/style LoRAs on SDXL. |
| `textEncoderLr` | | — | Only used when `trainTextEncoder: true`. SDXL/SD1 default to `5e-5`. |
| `trainTextEncoder` | | `true` (SDXL/SD1) | Train CLIP alongside the diffuser. |
| `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. |
| `optimizerType` | | `adafactor` (SDXL) / `adamw8bit` (SD1) | `adamw`, `adamw8bit`, `adam8bit`, `lion`, `lion8bit`, `adafactor`, `adagrad`, `prodigy`, `prodigy8bit`, `automagic`. |
| `networkDim` | | `32` | `1`–`256`. LoRA rank. Higher = more capacity, larger LoRA file. |
| `networkAlpha` | | matches `networkDim` | `1`–`256`. Scales the effective learning rate via `alpha / dim`. |
| `noiseOffset` | | `0.1` (SDXL) / `0` (SD1) | `0`–`1`. Adds noise during training to improve dark/bright sample handling. |
| `flipAugmentation` | | `false` | Random horizontal flips. Useful for symmetric subjects. |
| `shuffleTokens` | | `false` | Randomize caption-tag order. |
| `keepTokens` | | `0` | `0`–`10`. When `shuffleTokens: true`, keep the first N tokens fixed. |
| `triggerWord` | | *(none)* | Activation token recommended for character/style LoRAs. |
| `trainingData.type` | ✅ | — | `zip` (only currently supported type). |
| `trainingData.sourceUrl` | ✅ | — | Signed HTTPS URL or Civitai R2 AIR. |
| `trainingData.count` | ✅ | — | Number of images in the zip. |
| `samples.prompts[]` | | `[]` | Up to a handful of preview prompts rendered after each epoch with the trained LoRA at strength 1.0. Empty entries are skipped. |
| `samples.negativePrompt` | | *(none)* | Applied to all sample prompts. |

## Reading the result

Submitting with `wait=0` returns immediately with `status: processing`. Poll [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) (or use a webhook — see [Results & webhooks](/orchestration/guide/results-and-webhooks)) until the step settles. A successful step produces one `epochs[]` entry per epoch; each contains the trained LoRA blob and any sample images:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "training",
    "status": "succeeded",
    "output": {
      "moderationStatus": "Approved",
      "epochs": [
        {
          "epochNumber": 1,
          "model": { "id": "blob_...", "url": "https://.../epoch_1.safetensors" },
          "samples": [
            { "id": "blob_...", "url": "https://.../sample_0.jpeg" },
            { "id": "blob_...", "url": "https://.../sample_1.jpeg" }
          ]
        },
        { "epochNumber": 2, "model": { "...": "..." }, "samples": [] }
      ]
    }
  }]
}
```

The blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL when downloading the trained LoRA.

`moderationStatus` reflects safety review of the dataset: `Approved` is the green-light case. `Rejected` means the run was halted because the dataset failed moderation.

## Runtime

Training is highly variable and depends on `epochs × numberOfRepeats × count`. Always use `wait=0`.

| Ecosystem | Per-epoch wall time (15 imgs, default settings) | Typical full run |
|-----------|------------------------------------------------|------------------|
| `sdxl` | ~30–60 s | 5–15 min for 10 epochs |
| `sd1` | ~10–25 s | 3–10 min for 100 epochs (SD1 is much faster per step) |

Queue time on busy days can dominate — use the workflow's `events[]` to see when the step actually started.

## Cost

Billed in Buzz (1 Buzz ≈ 1/6 second of GPU). Both SDXL and SD1 are flat-rate per epoch:

```
total = costPerEpoch × epochs
costPerEpoch = 50 (sdxl), 50 (sd1)
```

Sample-prompt rendering is billed separately at standard SDXL / SD1 image-generation rates. Use `whatif=true` (the default for the **Preview cost** button on the `<RecipeRun>` widgets above) to see the exact pre-flight charge before committing.

| Configuration | Buzz |
|---------------|------|
| SDXL, `epochs: 10` | 500 + samples |
| SDXL, `epochs: 5` | 250 + samples |
| SD1, `epochs: 100` | 5000 + samples |
| SD1, `epochs: 20` | 1000 + samples |

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "epochs out of range" | `epochs` outside `1`–`20` | The hard cap is 20. For very-many-epoch SD1 runs (rare), submit multiple training workflows and chain them. |
| `400` with "model not found" | `model` URN points at a checkpoint that isn't the right ecosystem (e.g. an SD1 model on `ecosystem: "sdxl"`) | Use a `urn:air:sdxl:checkpoint:...` AIR for SDXL; `urn:air:sd1:checkpoint:...` for SD1. |
| `400` with "trainingData.sourceUrl not reachable" | Signed URL expired, or zip behind auth | R2 signed URLs expire — regenerate. Prefer Civitai R2 AIRs for stable references. |
| `400` with "count mismatch" | `trainingData.count` doesn't match the actual image count in the zip | Inspect the zip contents and update `count`. |
| Step `failed`, output `moderationStatus: "Rejected"` | Dataset failed automated content moderation | Replace flagged images and resubmit. Don't retry the same dataset. |
| Trained LoRA looks under-trained | Too few steps for the dataset | Raise `epochs` or `numberOfRepeats`; or increase `lr`. |
| Trained LoRA overfits / memorizes | Too many steps, dim too high, or alpha = dim with high `lr` | Lower `epochs`, drop `networkDim` to 16–24, or set `networkAlpha` to half of `networkDim`. |

## Related

* [Flux 1 LoRA training](./training-flux1) — open-weights Flux LoRAs at higher quality and higher cost
* [Flux 2 Klein LoRA training](./training-flux2-klein) — current Flux generation
* [Chroma / ERNIE / Qwen / Z-Image LoRA training](./training-other-image) — smaller image ecosystems
* [SDXL image generation](./sdxl) / [SD1 image generation](./sd1) — use a trained LoRA via `loras: { "<lora-air>": 1.0 }`
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running training jobs
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — the operations behind the examples here
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml) — standalone OpenAPI 3.1 YAML for the `training` endpoint

---

---
url: /orchestration/recipes/sdxl.md
---

# SDXL image generation

SDXL is Stable Diffusion XL — the higher-resolution successor to SD1. Native resolution **1024×1024**, massive community catalog, prompt style sits between SD1's Booru tags and Flux's natural language (tags still help; full sentences work too). Two invocation paths on the orchestrator:

| `engine` | Best for | Notes |
|----------|----------|-------|
| `sdcpp` (ecosystem `sdxl`) | **Default** — Stable Diffusion C++ on Civitai workers | `sampleMethod` + `schedule` sdcpp enums, textual-inversion embeddings, `uCache` mode. |
| `comfy` (ecosystem `sdxl`) | When you specifically need ComfyUI sampler/scheduler enums or a Comfy-only feature | `sampler` + `scheduler` Comfy enums, `denoiseStrength` (vs sdcpp's `strength`) on variants. |

**Default choice for new integrations**: `engine: "sdcpp"`, `ecosystem: "sdxl"`. Reach for `comfy` only when you specifically need a ComfyUI-exclusive sampler (`dpmpp_2m`, `dpmpp_sde`, etc.) or scheduler (`karras`).

Both engines support `createImage` and `createVariant` (img2img). Neither supports `editImage` — use [Flux 1 Kontext](./flux1#flux1-kontext-managed-editing-tier) or [Flux 2 Klein](./flux2#klein-createvariant-img2img) if you need prompt-driven editing.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* An SDXL checkpoint AIR URN — browse the [Civitai SDXL catalog](https://civitai.com/models?baseModels=SDXL+1.0)
* For `createVariant`: a source image URL

## sdcpp (default path)

### Text-to-image

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sdxl",
      "operation": "createImage",
      "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
      "prompt": "masterpiece, best quality, 1girl, solo, landscape, sunset, cinematic lighting, highly detailed",
      "negativePrompt": "worst quality, low quality, blurry",
      "width": 1024,
      "height": 1024,
      "cfgScale": 7,
      "steps": 25
    }
  }]
}
```

Common sdcpp-sdxl parameters:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `model` | — ✅ | AIR URN | SDXL checkpoint. See the [catalog](https://civitai.com/models?baseModels=SDXL+1.0). |
| `prompt` | — ✅ | ≤ 10 000 chars | Tag-style or natural language. Quality tags (`masterpiece, best quality, …`) still help. |
| `negativePrompt` | *(none)* | ≤ 10 000 chars | Recommended. `worst quality, low quality, blurry` is a solid starting point. |
| `width` / `height` | `1024` | `64`–`2048`, divisible by 16 | SDXL's native resolution is 1024×1024. Well-behaved aspect ratios: 1024×1024, 1152×896, 896×1152, 1216×832, 832×1216, 1344×768, 768×1344, 1536×640, 640×1536. |
| `cfgScale` | `7` | `0`–`30` | `6`–`8` works for most SDXL checkpoints; LCM/Turbo variants want `1`–`2`. |
| `steps` | `20` | `1`–`150` | `20`–`30` typical. LCM/Turbo checkpoints need fewer (`4`–`8`). |
| `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | `discrete` | enum | [`SdCppSchedule`](/orchestration/reference/). |
| `vaeModel` | *(checkpoint VAE)* | AIR URN | Override baked-in VAE. Rarely needed. |
| `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; `0.6`–`1.0` strengths typical. |
| `embeddings` | `[]` | array of AIR URNs | Textual inversions. Reference them in the prompt / negative prompt by their embedding name. |
| `quantity` | `1` | `1`–`12` | Number of images per call. |
| `seed` | random | int64 | Pin for reproducibility. |
| `uCache` | *(default)* | enum | [`SdCppUCacheMode`](/orchestration/reference/). Caching strategy; leave default unless you know you want otherwise. |

::: tip No `clipSkip` on SDXL
Unlike SD1, SDXL doesn't expose a `clipSkip` parameter — the model's dual text encoders make the SD1 clip-skip convention meaningless here. Sending `clipSkip` on an SDXL request will 400.
:::

### Aspect-ratio variants

SDXL handles off-square aspect ratios well as long as you stay near ~1 megapixel total area. Go too wide or too tall and composition artifacts (duplicated subjects, "mirrored twin" effects) start to appear.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sdxl",
      "operation": "createImage",
      "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
      "prompt": "masterpiece, best quality, wide panoramic view of a futuristic city at dusk",
      "negativePrompt": "worst quality, low quality",
      "width": 1536,
      "height": 640,
      "cfgScale": 7,
      "steps": 25
    }
  }]
}
```

### With LoRAs

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sdxl",
      "operation": "createImage",
      "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
      "prompt": "masterpiece, best quality, portrait of a warrior in ornate armor",
      "negativePrompt": "worst quality, low quality, blurry",
      "width": 1024,
      "height": 1024,
      "cfgScale": 7,
      "steps": 25,
      "loras": {
        "urn:air:sdxl:lora:civitai:123456@789012": 0.8
      }
    }
  }]
}
```

### Image-to-image (`createVariant`)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "sdxl",
      "operation": "createVariant",
      "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
      "prompt": "masterpiece, best quality, daytime with clear blue sky",
      "negativePrompt": "worst quality, low quality",
      "width": 1024,
      "height": 1024,
      "cfgScale": 7,
      "steps": 25,
      "image": "https://image.civitai.com/.../source.jpeg",
      "strength": 0.7
    }
  }]
}
```

`strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. `0.6`–`0.8` is the sweet spot for "keep composition, change style". Note `image` is a plain string URL (not a `{ url: ... }` wrapper), and the field is `strength` (not `denoiseStrength` like on Comfy).

## Comfy (alternative path)

Use `engine: "comfy"` when you need a ComfyUI-specific sampler — `dpmpp_2m` paired with the `karras` scheduler is a popular combo on SDXL for smoother detail:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "comfy",
      "ecosystem": "sdxl",
      "operation": "createImage",
      "model": "urn:air:sdxl:checkpoint:civitai:101055@128078",
      "prompt": "masterpiece, best quality, 1girl, solo, landscape, sunset, cinematic lighting",
      "negativePrompt": "worst quality, low quality, blurry",
      "width": 1024,
      "height": 1024,
      "steps": 30,
      "cfgScale": 7,
      "sampler": "dpmpp_2m",
      "scheduler": "karras"
    }
  }]
}
```

Key differences from sdcpp:

| Field | sdcpp | comfy |
|-------|-------|-------|
| Sampler | `sampleMethod` ([`SdCppSampleMethod`](/orchestration/reference/)) | `sampler` ([`ComfySampler`](/orchestration/reference/)) |
| Schedule | `schedule` ([`SdCppSchedule`](/orchestration/reference/)) | `scheduler` ([`ComfyScheduler`](/orchestration/reference/)) |
| Img2img strength | `strength` (default `0.7`) | `denoiseStrength` (default `0.75`) |
| Default `steps` | `20` | `30` |
| `embeddings` array | ✅ | — |
| `uCache` | ✅ | — |

Comfy also supports `createVariant` with `image` (plain string URL) + `denoiseStrength`; see the [`ComfySdxlVariantImageGenInput` schema](/orchestration/reference/).

## Reading the result

Both engines emit the standard `imageGen` output — an `images[]` array, one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

SDXL at 1024×1024 typically finishes in 10–25 s (sdcpp) or 15–35 s (comfy). `wait=60` works comfortably for `quantity ≤ 2`. LCM/Turbo checkpoints at `steps: 4`–`8` finish in 3–8 s and support higher `quantity` inside the same window. For larger batches or atypical aspect ratios, submit with `wait=0` and poll.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Both engines use the same per-pixel / per-step shape — different reference values:

```
total = base × (width × height / referencePixels) × (steps / referenceSteps) × quantity
```

| Path | `base` | `referencePixels` | `referenceSteps` | Defaults → Buzz |
|------|--------|-------------------|------------------|-----------------|
| `sdcpp + sdxl` | `8` | 1024² | `20` | 1024²/`steps: 20`/`q: 1` → **~8 Buzz** |
| `comfy + sdxl` | `8` | 1024² | `30` | 1024²/`steps: 30`/`q: 1` → **~8 Buzz** |

Examples:

* 1024² at `quantity: 4` → ~32 Buzz
* 1344×768 at `steps: 25` → ~8 × 0.98 × 1.25 ≈ **~10 Buzz** (sdcpp)
* 1024² at `steps: 35` → **~14 Buzz** (sdcpp)
* 1536×640 at `steps: 25` → ~8 × 0.94 × 1.25 ≈ **~9 Buzz** (sdcpp)

Atypical aspect ratios still bill by total pixel area, so a 2:1 panorama at the same megapixel count costs the same as a 1:1 image at 1024².

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "model must match AIR pattern" | Passed a bare model ID or version slug | Use a full AIR URN: `urn:air:sdxl:checkpoint:civitai:<modelId>@<versionId>`. |
| `400` with "clipSkip is not a valid property" on SDXL | `clipSkip` doesn't exist on SDXL (it's an SD1 knob) | Remove the field. SDXL uses dual text encoders; there's nothing to skip. |
| `400` with unknown property | Field not valid for this engine (e.g. `sampler` on sdcpp, `sampleMethod` on comfy, `denoiseStrength` on sdcpp) | Match the schema for your chosen engine — see the difference table above. |
| Output has duplicated subjects / mirrored-twin composition | Aspect ratio too far from 1:1 at fixed megapixel count | Stick to well-behaved SDXL ratios: 1024², 1152×896, 1344×768, 1536×640, and mirrors thereof. |
| Turbo/LCM checkpoint produces mush | `cfgScale` / `steps` tuned for base SDXL | Turbo/LCM want `cfgScale: 1`–`2` and `steps: 4`–`8`. |
| LoRA has no visible effect | Wrong AIR URN, model private / not published, or ecosystem mismatch | Verify the URN on the LoRA's Civitai page; only SDXL-tagged LoRAs work on the `sdxl` ecosystem. |
| Request timed out (`wait` expired) | Large `quantity`, atypical dimensions, busy worker | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [SD1 image generation](./sd1) — the 512×512 predecessor with the same two-engine pattern
* [Flux 2](./flux2) / [Flux 1](./flux1) image generation — newer families with stronger prompt adherence
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` (use `ecosystem: "sdxl"` on the enhancer)
* Full parameter catalog: the `SdxlCreateImageGenInput` / `SdxlVariantImageGenInput` / `ComfySdxlCreateImageGenInput` / `ComfySdxlVariantImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes/seedream.md
---

# Seedream image generation

Seedream is ByteDance's image-generation family. Single engine, multiple versions, native high-resolution support up to **4096×4096**, and image editing via `images[]`:

| `version` | Notes |
|-----------|-------|
| `v3` | Earliest version. Compatibility only — prefer `v4.5` or newer. |
| `v4` | Balanced quality and speed; lower cost than `v4.5`. |
| `v4.5` | **Default** — refined v4, better detail. Returned when `version` is omitted. |
| `v5.0-lite` | Latest fast tier — lighter than v4.5 with similar output characteristics for most workloads. |

**Default choice for new integrations**: `version: "v4.5"` (also the API default when the field is omitted). Use `v4` when you want lower cost with slightly less detail; try `v5.0-lite` for faster / cheaper output on the newest release.

Unlike most image engines exposed here, Seedream uses plain `width` / `height` (not an enum) and accepts very large outputs — up to 4096 px per side.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* For image editing: one or more source image URLs, data URLs, or Base64 strings

## Text-to-image

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "seedream",
      "version": "v4",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "width": 1024,
      "height": 1024,
      "guidanceScale": 2.5,
      "quantity": 1
    }
  }]
}
```

### Parameters

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `version` | `v4.5` | `v3` / `v4` / `v4.5` / `v5.0-lite` | Optional; defaults to `v4.5`. |
| `prompt` | — ✅ | ≥ 1 char | Natural-language works best. |
| `width` / `height` | `1024` | `256`–`4096` | Plain pixel dimensions. Stay near ~1 MP for native output; push higher only when you need print-size output. |
| `quantity` | `1` | `1`–`12` | |
| `guidanceScale` | `2.5` | `1`–`10` | Lower than SD1/SDXL's 7 — similar range to Flux. |
| `seed` | random | int32 | |
| `enableSafetyChecker` | `false` | boolean | |
| `images[]` | *(none)* | max 10 | Passing `images[]` switches to edit mode. URLs, data URLs, or Base64. |

### Newer version (`v5.0-lite`)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "seedream",
      "version": "v5.0-lite",
      "prompt": "A serene mountain landscape with a crystal clear lake at dawn",
      "width": 1536,
      "height": 864,
      "guidanceScale": 2.5
    }
  }]
}
```

### High-resolution output

Seedream can render up to 4096×4096 natively — useful when you need print-size output without a separate upscaling pass. Expect higher runtime:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "seedream",
      "version": "v4.5",
      "prompt": "An epic fantasy dragon perched on a mountain peak, highly detailed",
      "width": 2048,
      "height": 2048,
      "guidanceScale": 2.5
    }
  }]
}
```

Watch for request-timeout behaviour at large dimensions — see [Runtime](#runtime) below.

### Image editing

Drop one or more source images into `images[]` and the same call switches to edit mode — same shape, prompt is treated as the edit instruction:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "seedream",
      "version": "v4",
      "prompt": "Make it a winter scene with snow falling",
      "width": 1024,
      "height": 1024,
      "images": [
        "https://image.civitai.com/.../source.jpeg"
      ]
    }
  }]
}
```

Up to 10 reference images per call.

## Reading the result

Standard `imageGen` output — an `images[]` array, one per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.png" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

| Size | Typical wall time | `wait` recommendation |
|------|-------------------|-----------------------|
| ≤ 1024×1024 | 10–25 s | `wait=60` fine |
| 1536×1536 | 20–45 s | `wait=60` often fine; fall back to `wait=0` on busy periods |
| 2048–4096 per side | 40–90+ s | **Use `wait=0` + polling** — you'll usually exceed the 100-second request timeout otherwise |

Combined with `quantity > 2`, high-res outputs cross the timeout quickly. Always poll for anything above ~1.5 megapixels unless you're running a known-fast version.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Flat per-image pricing by `version`:

```
total = base × quantity
```

| Version | Base (per image) |
|---------|------------------|
| `v4.5` | **60** |
| `v5.0-lite` | **52** |
| `v4` / `v3` | **40** |

Examples:

* `v4.5`, 1024², `quantity: 1` → **60 Buzz**
* `v4`, 1024², `quantity: 1` → **40 Buzz**
* `v5.0-lite`, 1024², `quantity: 1` → **52 Buzz**
* `v4.5` at 2048², `quantity: 1` → 60 Buzz *(dimensions don't affect the fixed base)*
* `v4` with 3 reference images, `quantity: 1` → 40 Buzz *(editing uses the same base)*

Dimensions and `images[]` count don't change Seedream's Buzz price — the provider charges per-image-generated, not per-megapixel. If you need 4K output, you pay the same as 1K.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "version must be one of" | Typo or unsupported version slug | Use `v3`, `v4`, `v4.5`, or `v5.0-lite` (note the `v` prefix). |
| `400` with "width/height out of range" | Below 256 or above 4096 | Clamp to `256`–`4096`. |
| `400` with "images maxItems" | More than 10 source images on edit | Trim to 10. |
| Output too saturated / painterly | `guidanceScale` too high | Seedream prefers `2`–`3` — values above 5 typically degrade output. |
| Request timed out (`wait` expired) | High-res output, large quantity, or busy queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Seedream content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Google image generation](./google) — commercial alternative with Nano Banana + Imagen 4
* [OpenAI image generation](./openai) — commercial alternative with GPT-Image + DALL·E
* [Flux 2](./flux2) / [Qwen](./qwen) / [SDXL](./sdxl) — open-weights / sdcpp alternatives on Civitai-hosted workers
* [Image upscaling](./image-upscaler) — chain after `imageGen` (Seedream's native 4096 may already cover your upscale needs)
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `SeedreamImageGenInput` schema in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface

---

---
url: /site/guide.md
description: Getting started with the public Civitai site API.
---

# Site API Guide

This section covers everything you need to start building against the Civitai
site API: generating a token, making your first request, paginating results,
handling errors, and working with AIR identifiers.

## Pages

* [Getting started](./getting-started) — create a token and make your first call.
* [Authentication](./authentication) — how bearer tokens work and when they're required.
* [Pagination](./pagination) — `page` vs. `cursor` and the 1000-offset cap.
* [Errors](./errors) — response shape and HTTP status codes.
* [AIR identifiers](./air) — the canonical URN format for Civitai resources.

For a per-endpoint breakdown (parameters, response fields, examples), see the
[Reference](../reference/).

---

---
url: /orchestration/guide/submitting-work.md
---

# Submitting Work

The orchestrator gives you two ways to submit a workflow:

1. **Generic** — [`POST /v2/consumer/workflows`](/orchestration/reference/operations/SubmitWorkflow) with a `steps` array and polymorphic `$type` on each step. Use when you're chaining steps, mixing step types, or building a typed SDK that handles the discriminator itself.
2. **Per-recipe** — `POST /v2/consumer/recipes/{recipe}` with a single typed body (no `$type`). Each recipe is a convenience endpoint that wraps a single-step workflow. Use when your integration only runs one step type and you want the cleanest schema surface — e.g. [`InvokeChatCompletionStepTemplate`](/orchestration/reference/operations/InvokeChatCompletionStepTemplate), [`InvokeComfyStepTemplate`](/orchestration/reference/operations/InvokeComfyStepTemplate), etc.

Both paths return the same [`Workflow`](/orchestration/reference/operations/GetWorkflow) shape. Pick whichever maps more cleanly to your caller; you can mix freely.

## The generic path

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?whatif=false&wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [
    {
      "$type": "imageGen",
      "input": {
        "engine": "flux",
        "prompt": "A cat astronaut floating through neon space",
        "width": 1024,
        "height": 1024
      }
    }
  ]
}
```

### Workflow-level body fields

Everything in the table lives alongside `steps`. See the full [`WorkflowTemplate` schema](/orchestration/reference/operations/SubmitWorkflow) for types.

| Field | Purpose |
|-------|---------|
| `steps` | Required. One or more step objects, each with its own `$type` and `input`. |
| `callbacks` | Webhooks for lifecycle events — see [Results & webhooks](./results-and-webhooks). |
| `tags` | Indexed string tags for later lookup via [`QueryWorkflows`](/orchestration/reference/operations/QueryWorkflows). Great for tenant / campaign / user IDs. |
| `metadata` | Arbitrary JSON attached to the workflow. Not indexed — use for notes, correlation IDs, UI hints. |
| `arguments` | Reserved for templating values referenced by steps. |
| `allowMatureContent` | Controls mature-content gating and which Buzz currency pays — see [Payments (Buzz)](#payments-buzz) below. |
| `experimental` | Marks the workflow as experimental; may relax some guardrails. |
| `upgradeMode` | How to handle a workflow paid with blue/green Buzz that turns out to produce mature content — see [Payments (Buzz)](#payments-buzz). |
| `currencies` | Restrict which Buzz currencies may settle this workflow (see [Payments (Buzz)](#payments-buzz)). |
| `tips` | Optional tip amount attached to the workflow. |

### Query parameters

| Param | Default | Purpose |
|-------|---------|---------|
| `wait` | `0` | Seconds to block waiting for the workflow to finish, capped by the **100-second request timeout**. If the workflow hasn't reached a terminal state by then, you get a `202` with the in-flight workflow — keep the `id` and continue via webhooks or polling. |
| `whatif` | `false` | If `true`, validates and resolves the workflow (provider, estimated cost, required resources) without actually running it. Great for CI smoke tests and cost previews. |
| `hideMatureContent` | `false` | If `true`, mature blobs on the response won't include a `url`. Useful for rendering in user-facing UIs without re-checking policy per-blob. |

### Status codes

| Code | Meaning |
|------|---------|
| `200 OK` | The workflow reached a terminal state within your `wait` budget. |
| `202 Accepted` | The workflow is still running. The body is the current workflow; continue via webhooks or [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow). |
| `400 Bad Request` | Body failed validation — see [Errors & retries](./errors-and-retries) for the response shape. |
| `401 Unauthorized` | Missing or invalid bearer token. |
| `403 Forbidden` | Token is valid but not allowed to use this recipe / resource / mature content flag. |
| `429 Too Many Requests` | Rate limited. Back off and retry. |

## Payments (Buzz)

Workflows are paid for with **Buzz**, Civitai's on-platform currency. Buzz balances live on your civitai.com account, not on the workflow — you never pass dollar amounts in the API. Three flavors matter for the orchestrator:

| Currency | Earned / bought | Valid for |
|----------|-----------------|-----------|
| **Blue** (`blue`) | Earned by interacting on civitai.com | SFW workflows only |
| **Green** (`green`) | Purchased | SFW workflows only |
| **Yellow** (`yellow`) | Purchased | SFW **and** NSFW workflows |

### Default charging

If you don't set `currencies`, the orchestrator charges your account in order **blue → green → yellow**, splitting across currencies if needed (e.g. spend all your blue first, top up the remainder with green or yellow). This maximises the value of earned Buzz.

### Restricting to specific currencies

Set `currencies` on the workflow body to cap which accounts can settle it:

```json
{
  "steps": [ /* ... */ ],
  "currencies": ["green", "yellow"]
}
```

Only currencies you list will be drawn on. If none of them cover the cost, the submission is rejected with a payment error.

### Mature content and currency interaction

* Setting `allowMatureContent: true` forces payment in **yellow** Buzz (the only NSFW-capable currency).
* Setting `allowMatureContent: false` restricts to SFW and allows blue/green (plus yellow) to pay.
* Leaving `allowMatureContent` unset means the orchestrator decides *after* seeing which currency was charged: if the workflow settled with blue or green, it's treated as SFW; if it settled with yellow, mature content is allowed.

### `upgradeMode` — when the output turns out mature

If the workflow was paid with blue or green Buzz (SFW) but the generated content is classified as mature, `upgradeMode` controls what happens:

* `"manual"` — the mature output is withheld and the workflow waits on you. To release it, call [`UpdateWorkflow`](/orchestration/reference/operations/UpdateWorkflow) (`PUT`) or [`PatchWorkflow`](/orchestration/reference/operations/PatchWorkflow) (`PATCH`) with `allowMatureContent: true`. The orchestrator refunds the blue/green Buzz and charges yellow. If your yellow balance is insufficient, the update returns `400`.
* `"automatic"` — the orchestrator does that swap for you inline, charging the difference in yellow Buzz and delivering the mature output (or failing the workflow if yellow is insufficient).

### Previewing cost

Use `whatif=true` to get a cost estimate back without actually running the workflow. The response includes per-currency breakdowns so you can show users what they'd be charged before they commit.

## The per-recipe path

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/chatCompletion?whatif=false&wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "model": "openai/gpt-oss-120b",
  "messages": [
    { "role": "user", "content": "Summarize this release note: ..." }
  ]
}
```

Each `/v2/consumer/recipes/{recipe}` endpoint is a thin wrapper that builds a single-step workflow for you, so:

* The request body is the step's `input` schema directly — no `$type`, no `steps` array.
* `whatif`, `wait`, `hideMatureContent` still apply.
* Callbacks aren't expressible here — if you need webhooks, use the generic path.
* The response is still a full [`Workflow`](/orchestration/reference/operations/GetWorkflow).

## Choosing between them

| Your situation | Use |
|----------------|-----|
| Single step type, no callbacks, want a typed request body | Per-recipe |
| Multiple steps chained (e.g. generate → upscale → transcode) | Generic |
| Need webhooks, tags, metadata, or upgrade control | Generic |
| Building an SDK that consumes the polymorphic `steps` array generically | Generic |

## Updating and querying

Once submitted, a workflow is live until it reaches a terminal state. You can:

* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — fetch by ID
* [`QueryWorkflows`](/orchestration/reference/operations/QueryWorkflows) — list by tag, status, date range
* [`UpdateWorkflow`](/orchestration/reference/operations/UpdateWorkflow) / [`PatchWorkflow`](/orchestration/reference/operations/PatchWorkflow) — amend metadata / tags
* [`AddWorkflowTag`](/orchestration/reference/operations/AddWorkflowTag) / [`RemoveWorkflowTag`](/orchestration/reference/operations/RemoveWorkflowTag) — tag maintenance
* [`DeleteWorkflow`](/orchestration/reference/operations/DeleteWorkflow) — cancel / remove

For step-level updates (e.g. rewriting `input` before execution, if still `unassigned`), use [`UpdateWorkflowStep`](/orchestration/reference/operations/UpdateWorkflowStep) / [`PatchWorkflowStep`](/orchestration/reference/operations/PatchWorkflowStep).

---

---
url: /site/reference/tags.md
description: List model tags used on Civitai.
---

# Tags

Tags categorize models (and other content) on Civitai. Use this endpoint to
discover what's taggable; use the tag name as `?tag=` on
[`GET /models`](./models#list-models) to filter by it.

## List tags

```
GET /api/v1/tags
```

**Auth:** Public.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `limit` | integer (1–200) | 20 | Number of items per page. |
| `page` | integer (≥ 1) | 1 | 1-indexed page number. |
| `query` | string | — | Full-text search on tag name. |

This endpoint is scoped to model tags (`entityType=Model`) — you cannot
fetch image-level tags through it.

### Response

```json
{
  "items": [
    {
      "name": "character",
      "link": "https://civitai.com/api/v1/models?tag=character"
    }
  ],
  "metadata": {
    "totalItems": 0,
    "currentPage": 1,
    "pageSize": 1,
    "totalPages": 1
  }
}
```

### Field notes

* `link` is pre-built — follow it to list models carrying the tag.
* `totalItems` and `totalPages` may be reported as `0` when an exact count
  isn't cheap to compute. Use `items.length` and `nextPage` to drive
  pagination rather than the counts.
* Responses are cached server-side for 60 seconds.

### Example

```bash
# Common model tags
curl "https://civitai.com/api/v1/tags?limit=20"
```

---

---
url: /orchestration/recipes/text-to-speech.md
---

# Text-to-speech

The `textToSpeech` step type synthesises audio from text. Two modes on the same step:

* **Built-in speakers** — nine curated voices (`aiden`, `dylan`, `eric`, `ono_anna`, `ryan`, `serena`, `sohee`, `uncle_fu`, `vivian`). Pass `speaker: "<name>"` and go.
* **Voice cloning** — pass a `refAudioUrl` (and optionally the reference's transcript) and the output speaks in that voice.

Output is an Ogg Vorbis audio blob.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* Text to synthesise (English or Chinese work best; language auto-detected by default)
* *For voice cloning only:* a short reference audio clip URL (≤ ~10 s clean speech) and, ideally, its transcript

## The simplest request

Built-in speaker, auto-detected language, `wait=0` because TTS typically runs longer than the synchronous 100-second window:

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/textToSpeech?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "text": "Welcome to Civitai, the home of open-source generative AI.",
  "engine": "custom",
  "speaker": "vivian",
  "xVectorOnlyMode": false,
  "language": "English"
}
```

The response (after polling to `succeeded`) is a full [`Workflow`](/orchestration/reference/operations/GetWorkflow) whose step carries an `audioBlob` with a signed streaming URL.

::: tip Use `wait=0` for TTS
End-to-end processing for a single sentence is ~60–120 s including model load and queue wait. Short clips can sneak in under `wait=30` on a warm node, but relying on it is brittle — `wait=0` + polling is the safe default.
:::

## Via the generic workflow endpoint

Use this path for webhooks, tags, or chaining with other steps:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "textToSpeech",
    "input": {
      "text": "Welcome to Civitai, the home of open-source generative AI.",
      "engine": "custom",
      "speaker": "dylan",
      "xVectorOnlyMode": false,
      "language": "English"
    }
  }]
}
```

## Input fields

See the [`TextToSpeechInput` schema](/orchestration/reference/operations/InvokeTextToSpeechStepTemplate) for the complete definition. The `engine` property is a discriminator — each engine has its own set of valid fields.

### Shared fields

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `text` | ✅ | — | The text to synthesise. No hard length cap, but generation time scales roughly linearly with length. |
| `engine` | ✅ | `custom` | Which TTS backend to route to. `custom` covers both built-in speakers and voice cloning on one schema. |
| `language` | | `Auto` | Full English language name: `"English"`, `"Chinese"`, or `"Auto"`. ISO codes like `"en"` may not be recognised by the model — use the full English name. |

### `custom` engine

The `custom` engine flattens both modes into one request. Whether you're using a built-in speaker or cloning depends on which fields you provide:

| Field | Required | Used in mode | Notes |
|-------|----------|--------------|-------|
| `speaker` | ✅ for built-in | CustomVoice | One of `aiden`, `dylan`, `eric`, `ono_anna`, `ryan`, `serena`, `sohee`, `uncle_fu`, `vivian`. |
| `instruct` | | CustomVoice | Optional free-text style/tone instruction (e.g. `"Speak in a cheerful and enthusiastic tone."`). |
| `refAudioUrl` | ✅ for cloning | Base | URL of a reference audio clip (HTTP(S) URL or AIR URN). |
| `refText` | ✅ for cloning (unless `xVectorOnlyMode: true`) | Base | Accurate transcript of the reference audio — helps the model align voice features to phonemes. |
| `xVectorOnlyMode` | ✅ | both | `false` for most cases. `true` skips `refText` and uses only the speaker embedding from the reference — faster to wire up, slightly lower quality. |
| `maxNewTokens` | | both | Generation cap, optional. Leave unset unless you're seeing runaway output. |

## Built-in speakers with a style prompt

`instruct` lets you nudge tone/pacing without switching voices. Example: the `dylan` voice as a cheerful broadcaster:

```json
{
  "text": "Breaking news — we are live from the Civitai studios.",
  "engine": "custom",
  "speaker": "dylan",
  "instruct": "Speak in a cheerful and enthusiastic broadcaster tone.",
  "xVectorOnlyMode": false,
  "language": "English"
}
```

Short, specific directions work better than long prose (`"slow and serious"` beats a paragraph). The model doesn't always follow the instruction — treat it as a bias, not a guarantee.

## Voice cloning (Base mode)

Pass `refAudioUrl` + `refText` to clone the voice from a short reference clip:

```json
{
  "text": "This sentence is synthesized by cloning the voice from the reference audio.",
  "engine": "custom",
  "refAudioUrl": "https://.../reference.wav",
  "refText": "She had your dark suit in greasy wash water all year.",
  "xVectorOnlyMode": false,
  "language": "English"
}
```

Guidance for the reference:

* **Length**: 5–15 seconds of clean speech works best. Longer clips don't help and may time out.
* **Quality**: single speaker, minimal background noise, no music. Podcast intros or call-recording noise floor drop quality noticeably.
* **`refText` accuracy**: transcribe the reference exactly — including punctuation and capitalisation — or skip it via `xVectorOnlyMode: true`.
* **Reach**: the `refAudioUrl` must be fetchable by the orchestrator the same way [transcription's `mediaUrl`](./transcription#choosing-a-source-url) is. CDN-served files are safe; sites that inject per-request session state break the streaming fetch.

### `xVectorOnlyMode: true` — skip the reference transcript

If you don't have (or don't want to supply) a transcript, set `xVectorOnlyMode: true`. The model uses only the speaker embedding from the reference clip, no alignment:

```json
{
  "text": "Using only the speaker embedding from the reference — no transcript required.",
  "engine": "custom",
  "refAudioUrl": "https://.../reference.wav",
  "xVectorOnlyMode": true,
  "language": "English"
}
```

Trade-off: one less input to get right; slightly less faithful cloning on sentences whose phonetic content differs from the reference. Start with `xVectorOnlyMode: false` when quality matters.

## Chaining: transcribe then re-speak

A common pipeline — transcribe an existing clip, then synthesise the same text in a different voice:

```json
{
  "steps": [
    {
      "$type": "transcription",
      "name": "quote",
      "input": {
        "mediaUrl": "https://cdn.jsdelivr.net/gh/openai/whisper@main/tests/jfk.flac"
      }
    },
    {
      "$type": "textToSpeech",
      "name": "reread",
      "input": {
        "text": { "$ref": "quote", "path": "output.text" },
        "engine": "custom",
        "speaker": "dylan",
        "instruct": "Speak in a confident presidential tone.",
        "xVectorOnlyMode": false,
        "language": "English"
      }
    }
  ]
}
```

The `{ "$ref": "quote", "path": "output.text" }` reference feeds the transcribed string into `reread`'s `text` field at runtime. See [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism).

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "textToSpeech",
    "status": "succeeded",
    "output": {
      "audioBlob": {
        "id": "XSSD3Y6B6BSPFBC3QHV0WD8QJ0.ogg",
        "url": "https://orchestration-new.civitai.com/v2/consumer/streaming-blobs/XSSD3Y6B6BSPFBC3QHV0WD8QJ0.ogg?sig=...",
        "urlExpiresAt": "2027-04-13T23:44:20Z",
        "type": "audio",
        "duration": null
      },
      "speaker": "vivian",
      "modelType": "custom_voice"
    }
  }]
}
```

Fields:

* **`audioBlob.url`** — signed **streaming** URL for the generated audio (Ogg Vorbis, `.ogg`). Stream it directly in an `<audio src>` tag or download the bytes.
* **`audioBlob.id`** — blob identifier, also usable via [`GetBlob`](/orchestration/reference/operations/GetBlob) if you need a fresh URL later.
* **`audioBlob.duration`** — output length in seconds when available (may be `null` until the blob is fully materialised).
* **`speaker`** — the speaker name used (only populated for CustomVoice / built-in modes).
* **`modelType`** — `"custom_voice"` for built-in speakers, `"base"` for reference-audio cloning. `null` if the backend didn't classify.

Blob URLs are signed and expire — store the audio locally or call [`GetBlob`](/orchestration/reference/operations/GetBlob) when the URL expires.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Character-based with a minimum floor, multiplied by 2.6 when a built-in speaker is used:

```
textLength = max(1, ceil(len(text) / 100))
total      = textLength × (speaker != null ? 2.6 : 1)
```

| Shape | Buzz |
|-------|------|
| Base mode (voice cloning), 60 characters | **1** |
| Base mode, 500 characters | ~5 |
| Base mode, 2 000 characters | ~20 |
| CustomVoice (built-in `speaker`), 60 characters | ~2.6 |
| CustomVoice, 500 characters | ~13 |
| CustomVoice, 2 000 characters | ~52 |

Voice cloning via `refAudioUrl` is cheaper per character than picking a built-in `speaker` — the 2.6× multiplier only applies when `speaker` is set. `instruct`, `language`, and `maxNewTokens` don't affect cost.

## Runtime

End-to-end time for one short sentence is typically 60–120 seconds including model load, queue wait, and inference. Longer text scales roughly linearly. **Always submit with `wait=0`** and poll or subscribe to webhooks; `wait=30` synchronous calls will usually time out.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` validation error on `speaker` | Value not in the nine built-in names | Use one of the allowed speakers; check the [`CustomTextToSpeechInput` schema](/orchestration/reference/operations/InvokeTextToSpeechStepTemplate). |
| `400` validation error on `language` | Passed an ISO code like `"en"` | Use the full English name: `"English"`, `"Chinese"`, or `"Auto"`. |
| `400` validation error on `xVectorOnlyMode` | Missing — the field is required on the `custom` engine | Always include it; set to `false` unless you explicitly want x-vector-only cloning. |
| Voice cloning output sounds robotic | Reference clip is too noisy, too short, or contains multiple speakers | Supply a cleaner 5–15 s single-speaker reference. |
| Voice cloning ignores the reference entirely | `refAudioUrl` couldn't be fetched by the worker (cookie-gated host, 403, redirect loop) | Host the reference on a CDN / S3 direct / GitHub raw URL. |
| Prosody doesn't match `instruct` | The directive is too long or contradictory with the speaker's natural register | Keep `instruct` short and specific; try a different built-in speaker. |
| Request timed out (`wait` expired) | Synthesis too slow to finish in the synchronous window | Resubmit with `wait=0` and poll, or register a webhook. |
| Step `failed`, `reason = "blocked"` | Text or reference audio hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`InvokeTextToSpeechStepTemplate`](/orchestration/reference/operations/InvokeTextToSpeechStepTemplate) — the per-recipe endpoint
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/textToSpeech/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining
* [Multi-speaker dialogue](./multi-speaker-dialogue) — overlay several TTS clips for debate, interview, or audio-drama scenes (uses the `audioMix` step)
* [Transcription](./transcription) — the inverse: audio → text
* [ACE-Step music generation](./ace-step-audio) — lyrics + style → full song audio (different recipe, sibling capability)
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running workflows
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — feeding `output.text` into a TTS step via `$ref`

---

---
url: /orchestration/mcp/tools.md
---

# Tools, prompts, and resources

The MCP server advertises live schemas via `tools/list`, `prompts/list`, and `resources/list` on connect, so your client always sees authoritative parameter shapes. The tables below summarize what's exposed so you know what to look for and which REST recipe each tool maps to.

## Tools

### Image generation

| Tool | Purpose |
|---|---|
| `generate_image` | Text-to-image and image-edit. Engines: `sdcpp`, `seedream`, `flux1-kontext`, `openai`, `gemini`, `grok`, `google`, `wan`. Returns a resource link to each generated blob. |
| `upscale_image` | Repeated 2× upscale (1–3 passes → up to 8×). |
| `convert_image` | Format conversion (jpeg / png / webp / gif) with optional resize. |

Behavior maps directly to the [image recipes](/orchestration/recipes/) — see [Flux 2](/orchestration/recipes/flux2), [SDXL](/orchestration/recipes/sdxl), [Image upscaling](/orchestration/recipes/image-upscaler), and [Image conversion](/orchestration/recipes/convert-image) for parameter and output details.

### Video generation

| Tool | Purpose |
|---|---|
| `generate_video` | Text-to-video and image-to-video. Engines: `kling-v3`, `kling`, `haiper`, `veo3`, `wan`, `minimax`, `vidu`, `sora`, `grok`, `lightricks`. |
| `extract_video_frames` | Sample frames at a configurable rate; perceptual-hash deduplication filters near-identical frames. |
| `upscale_video` | FlashVSR 2–4× upscaling. |

See [WAN](/orchestration/recipes/wan), [Kling](/orchestration/recipes/kling), [Veo 3](/orchestration/recipes/veo3), and [Video upscaling](/orchestration/recipes/video-upscaler) for matching REST recipes.

### Audio

| Tool | Purpose |
|---|---|
| `transcribe_audio` | Speech-to-text with optional word-level timestamps. |
| `text_to_speech` | TTS with selectable speakers (`aiden`, `dylan`, `eric`, `ryan`, `serena`, `sohee`, `vivian`). |

See [Transcription](/orchestration/recipes/transcription) and [Text-to-speech](/orchestration/recipes/text-to-speech).

### Music

| Tool | Purpose |
|---|---|
| `generate_music` | ACE Step 1.5. Supports structured lyrics with section markers like `[Verse]`, `[Chorus]`, `[Bridge]`. Returns MP3 audio or WebM with cover image. |

See [ACE-Step music generation](/orchestration/recipes/ace-step-audio).

### Media analysis

| Tool | Purpose |
|---|---|
| `caption_media` | Generate a descriptive caption for an image or video. |
| `rate_media` | NSFW level, blocked status, content labels. Optional sub-analyses for age classification, face recognition, AI detection, and anime recognition. |
| `tag_media` | WD-style tagging with confidence scores and content-rating distribution. |

### Language models

| Tool | Purpose |
|---|---|
| `chat_completion` | OpenRouter passthrough — any model from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, etc. Supports multi-turn `system` / `user` / `assistant` messages. |

See [Chat completion](/orchestration/recipes/chat-completion) for the model ID format.

### Prompt utilities

| Tool | Purpose |
|---|---|
| `enhance_prompt` | Analyze and rewrite a generation prompt for a target ecosystem (`sd1`, `sdxl`, `flux`, `ltx2`). Returns the improved prompt with issues and recommendations. |

See [Prompt enhancement](/orchestration/recipes/prompt-enhancement).

### Discovery

| Tool | Purpose |
|---|---|
| `find_models` | Natural-language model search across image, video, audio, and chat catalogs. Accepts queries like `"fast cheap chat model"` or a metrics ID like `image/flux1-kontext/pro`. |

### Workflow management

| Tool | Purpose | Auth |
|---|---|---|
| `submit_workflow` | Submit raw workflow JSON — same shape as [`POST /v2/consumer/workflows`](/orchestration/reference/operations/SubmitWorkflow). Use when a specific tool doesn't cover your case. | optional |
| `get_workflow` | Status and output by workflow ID. | optional |
| `cancel_workflow` | Cancel a running workflow. | optional |
| `list_workflows` | Recent workflows for the authenticated user. Supports `take`, `tags`, `excludeFailed`. | **required** |

## Prompts

The server ships three built-in MCP prompts that return ready-to-use guidance for multi-step pipelines. Clients can list and invoke them like any MCP prompt.

| Prompt | Input | What it returns |
|---|---|---|
| `image_generation_guide` | `intent` (e.g. `"photorealistic product photo"`, `"anime character"`, `"fast draft"`) | Engine comparison table, quick recommendations, parameter tips. |
| `video_creation_pipeline` | `intent` (e.g. `"product showcase"`, `"music video clip"`, `"talking head"`) | Recommended pipeline (image → video → upscale), engine selection matrix, example tool sequence. |
| `content_analysis_pipeline` | `mediaUrl` | Stepwise plan: caption → tag → rate, with notes on when to use each. |

## Resources

| URI template | MIME | Behavior |
|---|---|---|
| `spine://blobs/{blobId}` | `application/octet-stream` | Images are inlined as base64 content. Videos and audio return a 5-minute signed download URL. Returns an error if the blob does not exist. |

Tools that produce media include resource links pointing at this URI template, so MCP clients can render outputs inline without a separate download step.

## Capabilities advertised on `initialize`

```json
{
  "protocolVersion": "2024-11-05",
  "capabilities": {
    "logging": {},
    "prompts": { "listChanged": true },
    "resources": { "listChanged": true },
    "tools": { "listChanged": true }
  },
  "serverInfo": {
    "name": "civitai-orchestration",
    "title": "Civitai Orchestration MCP Server",
    "description": "Generate images, videos, audio, and more via the Civitai Orchestration platform"
  }
}
```

## Related

* [MCP Server overview](/orchestration/mcp/) — endpoint, auth, and client setup
* [Recipes](/orchestration/recipes/) — REST equivalents with runnable examples
* [API Reference](/orchestration/reference/) — generated from the OpenAPI spec

---

---
url: /orchestration/recipes/transcription.md
---

# Transcription

The `transcription` step type takes an audio or video URL and returns the spoken text, using **Qwen3-ASR-1.7B** for recognition plus **Qwen3-ForcedAligner-0.6B** for timestamp alignment. It handles dozens of languages out of the box, auto-detects the spoken language, and can return **phrase-level** timestamps (one entry per spoken phrase/clause, each containing multiple words) suitable for captions and seek-aware UIs.

Common uses:

* Transcribe podcasts, interviews, voice memos
* Generate captions (SRT / VTT) for video content via timestamps
* Feed speech into text-processing pipelines (summarization, search indexing)
* Pull the dialogue out of an existing video

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A publicly-fetchable audio or video URL (`.mp3`, `.wav`, `.m4a`, `.mp4` with an audio track, etc.). Civitai CDN URLs work directly.

## The simplest request

Use the per-recipe endpoint when you just need the text from one piece of audio:

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/transcription?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "mediaUrl": "https://.../interview.mp3"
}
```

Defaults: language is auto-detected, word-level timestamps are returned. The response is a full [`Workflow`](/orchestration/reference/operations/GetWorkflow) whose single step carries the transcript in `output.text`.

::: tip Choosing a source URL
The URL must be **publicly fetchable** AND served by a host that supports HTTP range requests and consistent responses across requests — ffprobe streams + seeks rather than downloading the whole file. Sites that inject per-request session cookies (common on `wp-content/uploads` endpoints behind AWS ALBs) often break the seek and fail with `Failed to read frame size: Could not seek to N`. CDN-served files (jsdelivr, GitHub raw, S3 without redirect) are safe defaults; the Civitai CDN works directly.
:::

::: tip Use `wait=0` for long audio
Billing is computed per 30 s of audio (minimum 1 unit), and real processing time roughly tracks audio length + queue wait. Anything longer than ~90 s of audio is a candidate for `wait=0` + polling; a multi-minute file will blow past the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline).
:::

## Via the generic workflow endpoint

Equivalent request through [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — use this path when you need webhooks, tags, or to chain with other steps:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "transcription",
    "input": {
      "mediaUrl": "https://.../interview.mp3",
      "returnTimeStamps": true
    }
  }]
}
```

## Input fields

See the [`TranscriptionInput` schema](/orchestration/reference/operations/InvokeTranscriptionStepTemplate) for the full definition.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `mediaUrl` | ✅ | — | URL of the audio (or video with an audio track). Must be publicly fetchable without auth. ffprobe must be able to stream + seek the response (see tip above). |
| `language` | | auto-detect | ISO 639-1 hint like `"en"`, `"zh"`, `"es"`, `"ja"`. Omit to let the model detect. Setting it anyway usually improves accuracy on short or noisy clips. The *output* language is returned as a full English name (`"English"`, `"Spanish"`, …), not the ISO code. |
| `context` | | — | Optional free-text prompt describing the subject matter — helps the model spell unusual proper nouns, technical terms, or domain jargon correctly. |
| `returnTimeStamps` | | `true` | Whether to return word-level `startTime` / `endTime` pairs. Leave `true` unless you don't need them; the extra cost is negligible. |

## Language hints

Auto-detection is reliable on clear speech but can flip on short clips, heavily accented speakers, or audio that starts with non-speech (music, silence). If you know the language upfront, set it:

```json
{
  "mediaUrl": "https://.../audio.mp3",
  "language": "en"
}
```

The detected (or forced) language is returned in `output.language` — note it comes back as the full English name (`"English"`, `"Japanese"`, …), not the ISO code you passed in.

## Context hints for accuracy

Provide a short free-text `context` to bias the model toward correct spellings for proper nouns, acronyms, or technical vocabulary. For a tech podcast:

```json
{
  "mediaUrl": "https://.../podcast.mp3",
  "language": "en",
  "context": "Technical discussion about Kubernetes, CRDs, and Flux CD."
}
```

Think of `context` like a prompt passed to the ASR model — a sentence or two of topic / vocabulary hints usually helps more than a long verbose description.

## Generating captions / SRT

Video files work as a `mediaUrl` too — pass an `.mp4` (or any container FFmpeg understands) and the audio track is extracted automatically. Combine with `returnTimeStamps: true` to get everything you need to emit an SRT or VTT file:

```json
{
  "mediaUrl": "https://.../clip.mp4",
  "returnTimeStamps": true
}
```

The `output.timeStamps` array holds one entry per spoken word, each with `{ text, startTime, endTime }` in seconds. For subtitle generation, group adjacent word entries into phrase-sized chunks client-side; each chunk can then map directly to one caption line.

## Reading the result

A successful `transcription` step emits the full transcript plus structured timing. Real output from the JFK clip above:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "transcription",
    "status": "succeeded",
    "output": {
      "text": "And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country.",
      "language": "English",
      "timeStamps": [
        { "text": "And so my fellow Americans",         "startTime": 0.32, "endTime": 2.16 },
        { "text": "ask not",                           "startTime": 3.28, "endTime": 4.32 },
        { "text": "what your country can do for you",  "startTime": 5.36, "endTime": 7.52 },
        { "text": "Ask what you can do for your country", "startTime": 8.16, "endTime": 10.48 }
      ],
      "elapsedSeconds": 0.876
    }
  }]
}
```

Fields:

* **`text`** — the full transcript as one string, with punctuation and casing restored
* **`language`** — the detected (or hinted) language as a **full English name** (e.g. `"English"`, `"Mandarin"`, `"Spanish"`). Not the ISO code you pass in.
* **`timeStamps`** — one entry per phrase/clause (spans multiple words each); empty array if `returnTimeStamps: false`
* **`elapsedSeconds`** — server-side model runtime (excludes queue wait — this is just the recognition pass)

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Duration-based with a minimum floor of 1:

```
total = max(1, ceil(durationSeconds / 30))
```

| Audio length | Buzz |
|-------------|------|
| ≤ 30 s | **1** |
| 31–60 s | **2** |
| 5 min | ~10 |
| 30 min | ~60 |
| 60 min | ~120 |

Transcription is the cheapest speech path Civitai exposes — every 30 seconds of source is one Buzz, rounded up. The `language`, `context`, and `returnTimeStamps` fields don't affect cost.

## Runtime

Real-time-factor (processing time ÷ audio length) is well below 1 on Qwen3-ASR — a 5-minute recording typically finishes in well under a minute of server-side compute, plus queue wait. Plan for `wait=0` + polling on anything beyond ~90 s of audio.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "Unable to analyze audio file: … Failed to read frame size: Could not seek to N" | The host doesn't honor HTTP range requests or injects per-request session state (AWS ALB cookies, etc.), so ffprobe's streaming seek fails | Use a CDN-served file (jsdelivr, GitHub raw, S3 direct, Civitai CDN) instead. |
| `400` with "Unable to analyze audio file" (no seek error) | Source couldn't be probed (corrupt, wrong container, DNS failure, 403/404, redirect loop) | Verify the URL resolves with a direct `curl` and returns valid audio. |
| `400` with "Input audio resource does not exist" | Passed an AIR that doesn't resolve | Pass a plain URL instead, or confirm the AIR is correct. |
| `output.language` is wrong | Auto-detection failed on a short / noisy clip | Set `language` explicitly. |
| Proper nouns / jargon misspelled | Model hasn't seen the term often | Add a `context` string describing the subject matter and vocabulary. |
| Empty or partial transcript | Audio contains long silence, music, or very low-level speech | Trim silence / pre-normalize audio; confirm speech is actually audible at a reasonable volume. |
| Request timed out (`wait` expired) | Audio too long to finish in the synchronous window | Resubmit with `wait=0` and poll, or register a webhook. |
| Step `failed`, `reason = "blocked"` | Audio hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`InvokeTranscriptionStepTemplate`](/orchestration/reference/operations/InvokeTranscriptionStepTemplate) — the per-recipe endpoint
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/transcription/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining
* [Text-to-speech](./text-to-speech) — the inverse: text → audio
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running workflows
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — how to feed `output.text` into a downstream step

---

---
url: /orchestration.md
---


---

---
url: /site/reference/users.md
description: 'Look up users by ID or username, and inspect the authenticated caller.'
---

# Users

## Get the current user

```
GET /api/v1/me
```

**Auth:** Authenticated — a valid token is required. Returns `401` otherwise.

Use this to confirm which account a token belongs to, check membership
status, or surface the caller's subscription tier in your own UI.

### Response

```json
{
  "id": 12345,
  "username": "you",
  "tier": "founder",
  "status": "active",
  "isMember": true,
  "subscriptions": ["monthly"]
}
```

### Field notes

| Field | Description |
|-------|-------------|
| `id` | Civitai user ID. |
| `username` | Current username. |
| `tier` | Membership tier — `free`, `founder`, `bronze`, `silver`, `gold`. |
| `status` | One of `active`, `muted`, `banned`. |
| `isMember` | Shortcut: `true` when `tier !== 'free'`. |
| `subscriptions` | Names of active subscription products. Empty array when none. |

### Errors

```
HTTP/2 401
{"error":"Unauthorized"}
```

Returned for missing, malformed, or revoked tokens alike — the API does not
distinguish between them.

### Example

```bash
curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/me"
```

::: tip
Browsers block cross-origin requests that carry credentials unless the server
allowlists the origin. If the Try It above fails with a CORS error from
`developer.civitai.com`, use `curl` locally instead — the endpoint itself is
working.
:::

## Look up users

```
GET /api/v1/users
```

**Auth:** Public.

Resolve user IDs or do a username prefix search. Returns just `{id, username}`
per result — this endpoint is intentionally lean. Use it to map IDs to
usernames (e.g. when post-processing `/images` results) or to power a
"find user" autocomplete.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `ids` | comma-separated integers | — | Look up specific user IDs. When set, the response limit is `ids.length`. |
| `query` | string | — | Username prefix match (`username LIKE 'query%'`). Returns the shortest matches first. |

When neither `ids` nor `query` is supplied, the endpoint returns the first 5
users in the database. That's almost never what you want — always pass one
of the two.

### Response

```json
{
  "items": [
    { "id": 12345, "username": "you", "avatarNsfw": "None" },
    { "id": 67890, "username": "yousef", "avatarNsfw": "None" }
  ]
}
```

| Field | Description |
|-------|-------------|
| `id` | Civitai user ID. |
| `username` | Current username. |
| `avatarNsfw` | Browsing-level label for the user's avatar (`None`, `Soft`, `Mature`, `X`). Always `None` unless the user has set a mature avatar. |

Deleted and system users (`id = -1`) are filtered out automatically.

### Errors

| Status | Body | Cause |
|--------|------|-------|
| `400` | Zod error JSON | Malformed `ids` (non-numeric) or other parse failure. |
| `500` | `{"message":"An unexpected error occurred", "error": ...}` | Internal failure. |

### Example

```bash
# Map a list of IDs to usernames
curl "https://civitai.com/api/v1/users?ids=12345,67890"

# Username autocomplete
curl "https://civitai.com/api/v1/users?query=yo"
```

---

---
url: /site/reference/vault.md
description: 'Manage a member''s saved model versions — list, check, add, and remove.'
---

# Vault

The **Civitai Vault** is a personal collection of model versions kept on
behalf of a paid member. It survives even if the original model is deleted
from the site, so creators can keep using resources they relied on.

::: warning Membership required
All vault endpoints require an active Civitai membership (`bronze`, `silver`,
`gold`, or `founder`). Free-tier callers get a `200` response with
`{"vault": null}` — there is no `403` to distinguish "no membership" from
"empty vault". Check [`GET /me`](./users#get-the-current-user) for the
caller's `tier` if you need to gate ahead of time.
:::

::: warning Authentication
All vault endpoints require a Civitai API token. Pass it as
`Authorization: Bearer <token>`. See [Authentication](../guide/authentication).
:::

## Get or create the vault

```
GET /api/v1/vault/get
```

Returns the caller's vault, creating one on first call.

### Response

```json
{
  "vault": {
    "userId": 12345,
    "storageKb": 1048576,
    "usedStorageKb": 6775430,
    "meta": {},
    "updatedAt": "2025-04-01T08:30:00.000Z"
  }
}
```

| Field | Description |
|-------|-------------|
| `userId` | Vault owner's user ID. Doubles as the vault's primary key. |
| `storageKb` | Total storage allowance, derived from the user's active membership(s). |
| `usedStorageKb` | Sum of `modelSizeKb + detailsSizeKb + imagesSizeKb` across all items. |
| `meta` | Free-form metadata bag. Reserved for future use. |
| `updatedAt` | When the vault row last changed. |

Free-tier callers get `{"vault": null}` instead.

### Example

```bash
curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/vault/get"
```

## List vault items

```
GET /api/v1/vault/all
```

Paginated list of model versions in the caller's vault.

### Query parameters

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `limit` | integer (1–200) | `60` | Items per page. |
| `page` | integer | `1` | 1-indexed page number. |
| `query` | string | — | Case-insensitive substring match against `modelName`, `versionName`, and `creatorName`. |
| `types` | comma-separated `ModelType` values | — | e.g. `Checkpoint,LORA`. |
| `categories` | comma-separated strings | — | Filter by category. |
| `baseModels` | comma-separated strings | — | e.g. `SDXL 1.0,Flux.1 D`. |
| `dateCreatedFrom` / `dateCreatedTo` | ISO date | — | Bound the underlying model version's `createdAt`. |
| `dateAddedFrom` / `dateAddedTo` | ISO date | — | Bound when the item was added to the vault. |
| `sort` | enum | `Recently Added` | One of `Recently Added`, `Recently Created`, `Model Name`, `Model Size`. URL-encode the space. |

### Response

```json
{
  "items": [
    {
      "id": 9876,
      "vaultId": 12345,
      "status": "Stored",
      "modelVersionId": 2514310,
      "modelId": 827184,
      "modelName": "WAI-illustrious-SDXL",
      "versionName": "v16.0",
      "creatorId": 67890,
      "creatorName": "WAI",
      "type": "Checkpoint",
      "baseModel": "Illustrious",
      "category": "character",
      "modelSizeKb": 6775430,
      "detailsSizeKb": 12,
      "imagesSizeKb": 4096,
      "createdAt": "2025-04-01T08:30:00.000Z",
      "addedAt": "2025-04-01T08:30:00.000Z",
      "refreshedAt": null,
      "notes": null,
      "meta": { "failures": 0 },
      "coverImageUrl": "https://image.civitai.com/.../cover.jpeg",
      "files": [
        { "id": 2402203, "sizeKB": 6775430, "url": "https://...", "displayName": "waiIllustriousSDXL_v160.safetensors" }
      ]
    }
  ],
  "totalItems": 42,
  "currentPage": 1,
  "pageSize": 60,
  "totalPages": 1
}
```

| Field | Description |
|-------|-------------|
| `status` | `Pending`, `Stored`, or `Failed`. Cover image and full files are only available once `Stored`. |
| `modelName` / `versionName` / `creatorName` | Snapshot at vault time. Survive deletion of the original model. |
| `coverImageUrl` | Pre-signed URL to a cover image, or `null` while the item is still pending. |
| `files` | Mirror of the model version's downloadable files at vault time. Each entry has `id`, `sizeKB`, `url`, `displayName`. |
| `category` | Tag-derived category. May be empty string if uncategorised. |
| `meta.failures` | Counter for ingestion retries. Diagnostic only. |

### Example

```bash
curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/vault/all?limit=10&types=LORA"
```

## Check vault membership

```
GET /api/v1/vault/check-vault
```

Bulk lookup: for a list of model version IDs, return which ones the caller
already has in their vault.

### Query parameters

| Name | Type | Description |
|------|------|-------------|
| `modelVersionIds` | comma-separated integers | IDs to check. Required. |

### Response

An array, one entry per requested ID. `vaultItem` is `null` when the version
isn't in the vault, otherwise it's the full vault item record (same shape as
in `/vault/all`).

```json
[
  { "modelVersionId": 2514310, "vaultItem": { "id": 9876, "...": "..." } },
  { "modelVersionId": 2402203, "vaultItem": null }
]
```

### Example

```bash
curl -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/vault/check-vault?modelVersionIds=2514310,2402203"
```

## Add or remove a model version

```
POST /api/v1/vault/toggle-version
```

Toggles a model version in the caller's vault. If it isn't there, it's added;
if it is, it's removed. There's no separate add/remove endpoint — both
operations go through this one.

### Query parameters

| Name | Type | Description |
|------|------|-------------|
| `modelVersionId` | integer | Required. |

### Response

```json
{
  "success": true,
  "vaultId": 12345
}
```

`vaultId` is omitted when the operation removed the item.

### Errors

| Status | Cause |
|--------|-------|
| `401` | Missing or invalid token. |
| `400` | Missing or malformed `modelVersionId`. |
| `500` | Storage quota exceeded, model not found, or other internal error — check `error.message`. |

### Example

```bash
curl -X POST -H "Authorization: Bearer $CIVITAI_TOKEN" \
  "https://civitai.com/api/v1/vault/toggle-version?modelVersionId=2514310"
```

::: tip
The "Try It" widget is GET-only, so this endpoint can only be exercised from a
shell. Run the curl above (it's idempotent — calling it twice puts the version
back).
:::

---

---
url: /orchestration/recipes/veo3.md
---

# Veo 3 video generation

Google's Veo 3 video generation model, available in two releases (`3.0` and `3.1`) with three speed/cost tiers. The operation (text-to-video, image-to-video, first-last-frame, reference) is inferred from the number of images passed.

| `version` | `mode` | Notes |
|-----------|--------|-------|
| `3.1` | `standard` | Best quality. Default. |
| `3.1` | `fast` | ~40% cheaper, significantly faster. |
| `3.1` | `lite` | ~70% cheaper, fastest. Supports only text-to-video and single image-to-video. |
| `3.0` | `standard` / `fast` | Previous release. Same operations as 3.1 standard/fast; `lite` is 3.1-only. |

**Default choice**: `version: "3.1"`, `mode: "standard"` for maximum quality. Use `mode: "fast"` for iterating; `mode: "lite"` for rapid prototyping or high-volume tasks.

All Veo 3 jobs exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — always submit with `wait=0`.

## Text-to-video

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "veo3",
      "version": "3.1",
      "prompt": "A lighthouse standing on rocky cliffs at sunset, waves crashing below, cinematic",
      "aspectRatio": "16:9",
      "duration": 8,
      "generateAudio": true
    }
  }]
}
```

## Fast mode

Significantly faster and ~40% cheaper than standard. Good for iteration:

```json
{
  "engine": "veo3",
  "version": "3.1",
  "prompt": "A peaceful forest path in autumn with golden leaves falling",
  "aspectRatio": "16:9",
  "duration": 8,
  "mode": "fast",
  "generateAudio": false
}
```

## Lite mode *(3.1 only)*

Cheapest and fastest tier — roughly 70% cheaper than standard. Supports text-to-video and single image-to-video only:

```json
{
  "engine": "veo3",
  "version": "3.1",
  "prompt": "A busy coffee shop with people working and chatting",
  "aspectRatio": "16:9",
  "duration": 8,
  "mode": "lite",
  "generateAudio": false
}
```

## Image-to-video

Pass one image to animate from that start frame:

```json
{
  "engine": "veo3",
  "version": "3.1",
  "prompt": "The subject slowly turns and looks into the distance",
  "images": ["https://image.civitai.com/.../photo.jpeg"],
  "aspectRatio": "16:9",
  "duration": 8,
  "generateAudio": false
}
```

## First-last-frame interpolation

Pass exactly two images to interpolate between a start and end frame:

```json
{
  "engine": "veo3",
  "version": "3.1",
  "prompt": "A smooth, natural transition between the two scenes",
  "images": [
    "https://example.com/first.jpeg",
    "https://example.com/last.jpeg"
  ],
  "aspectRatio": "16:9",
  "duration": 8,
  "generateAudio": false
}
```

## Reference-to-video

Pass three or more images to use them as style/subject references:

```json
{
  "engine": "veo3",
  "version": "3.1",
  "prompt": "The character walks through a forest in this art style",
  "images": [
    "https://example.com/ref1.jpeg",
    "https://example.com/ref2.jpeg",
    "https://example.com/ref3.jpeg"
  ],
  "duration": 8
}
```

## Operations — how images count determines operation

| `images[]` length | Operation |
|-------------------|-----------|
| 0 | text-to-video |
| 1 | image-to-video |
| 2 | first-last-frame-to-video |
| 3+ | reference-to-video |

::: warning Lite mode restrictions
`mode: "lite"` only supports text-to-video (0 images) and image-to-video (1 image). Passing 2+ images with `lite` returns a `400`.
:::

## Parameters

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"veo3"` |
| `version` | `"3.1"` | `"3.0"` or `"3.1"` |
| `mode` | `"standard"` | `"standard"`, `"fast"`, `"lite"`. `lite` is 3.1-only. |
| `prompt` | — ✅ | Generation prompt. |
| `negativePrompt` | `null` | What to avoid. |
| `aspectRatio` | `"16:9"` | `"16:9"`, `"9:16"`, `"1:1"` |
| `duration` | `8` | `4`, `6`, or `8` seconds. |
| `generateAudio` | `true` | Emit a synchronized audio track. Disable to reduce cost by ~33%. |
| `images[]` | `[]` | 0–3+ images. Count determines operation type. |
| `enablePromptEnhancer` | `true` | LLM expands the prompt before generation. |
| `seed` | random | Integer for reproducibility. |

## Cost

Cost scales with `duration`, audio, and mode.

```
total = baseCost(duration) × audioFactor × modeFactor
```

| Duration | baseCost |
|----------|----------|
| 4 s | 1 667 |
| 6 s | 2 500 |
| 8 s | 3 333 |

| `mode` | modeFactor |
|--------|------------|
| `standard` | × 1.0 |
| `fast` | × 0.6 |
| `lite` | × 0.3 |

| `generateAudio` | audioFactor |
|-----------------|-------------|
| `true` (default) | × 1.0 |
| `false` | × 0.67 |

Example costs at **8 s** (Buzz):

| Mode | With audio | Without audio |
|------|------------|---------------|
| `standard` | **3 333** | **2 233** |
| `fast` | **2 000** | **1 333** |
| `lite` | **1 000** | **667** |

Example costs at **4 s** (Buzz):

| Mode | With audio | Without audio |
|------|------------|---------------|
| `standard` | **1 667** | **1 117** |
| `fast` | **1 000** | **667** |
| `lite` | **500** | **333** |

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

Standard 8 s typically completes in 3–7 minutes. Fast and Lite are faster. Use `wait=0` + polling or webhooks:

* **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` every 10–30 s

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "lite mode only supports..." | Passed 2+ images with `mode: "lite"` | Use `standard` or `fast` for first-last-frame and reference-to-video. |
| `400` with "lite mode requires version 3.1" | Used `mode: "lite"` with `version: "3.0"` | Set `version: "3.1"` to use lite mode. |
| `400` with "duration must be one of" | Sent a duration not in `[4, 6, 8]` | Use only 4, 6, or 8 seconds. |
| Output lacks audio | `generateAudio: false` | Set `generateAudio: true` (the default). |
| Step `failed`, `reason = "no_provider_available"` | Google API queue busy | Retry shortly. |
| Step `failed`, `reason = "blocked"` | Google content policy | Don't retry the same input. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [WAN video generation](./wan) — open-source alternative
* [Kling video generation](./kling) — another commercial video model
* [Kling video generation](./kling) — another commercial video model

---

---
url: /orchestration/recipes/video-interpolation.md
---

# Video frame interpolation

The `videoInterpolation` step type takes a video and returns a version with more frames per second, using **[VFIMamba](https://huggingface.co/MCG-NJU/VFIMamba)** — a frame-interpolation model that synthesizes intermediate frames between existing ones. `interpolationFactor: 2` doubles the frame count; `interpolationFactor: 3` triples it. Resolution and duration stay the same — only the frame rate changes, giving you smoother motion.

Common uses:

* **Smooth out generated video** — most video-gen models output at 16 or 24 FPS; interpolate to 48–72 FPS for smoother playback.
* **Rescue low-framerate source** — older footage at 24 FPS or hand-drawn animation at 12 FPS.
* **Full polish pass** — chain `videoGen` → `videoInterpolation` → `videoUpscaler` for a higher-res, higher-FPS output from a short gen.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A source video URL — publicly fetchable by the orchestrator (Civitai CDN URLs work directly)

## The simplest request

Use the per-recipe endpoint when you just want to smooth one clip and don't need webhooks or multi-step chaining:

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/videoInterpolation?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "video": "https://.../input.mp4"
}
```

Defaults apply `interpolationFactor: 2`. The response is a full [`Workflow`](/orchestration/reference/operations/GetWorkflow) whose single step carries the smoothed video blob.

::: tip Use `wait=0` for video
VFIMamba processes frame-by-frame and scales with clip length; a multi-second clip almost always exceeds the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline). Submit with `wait=0`, then poll or [subscribe via webhook](/orchestration/guide/results-and-webhooks).
:::

## Via the generic workflow endpoint

Equivalent request through [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — use this path when you need webhooks, tags, or to chain with other steps:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoInterpolation",
    "input": {
      "video": "https://.../input.mp4",
      "interpolationFactor": 2
    }
  }]
}
```

## Input fields

See the [`VideoInterpolationInput` schema](/orchestration/reference/operations/InvokeVideoInterpolationStepTemplate) for the full definition.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `video` | ✅ | — | URL of the source video. Must be publicly fetchable without auth. Single video stream only — multi-track sources are rejected. |
| `interpolationFactor` | | `2` | Integer `2` or `3`. Output frame count ≈ `input × interpolationFactor`. |
| `model` | | `VFIMamba` | Currently the only supported model; leave as default. |

### Picking an interpolation factor

`interpolationFactor: 2` is the safe default — it doubles the frame count (e.g., 24 FPS → 48 FPS) and produces reliably smooth motion. `3` triples frames and works well on low-motion content, but can introduce artifacts on fast-moving or heavily-compressed sources. Start at `2` and only step up after visually confirming the output holds up.

### Source resolution limit

VFIMamba enforces a **2048 px hard cap on either axis** of the source — width AND height must each be ≤ 2048 before interpolation. The orchestrator probes your source at submit time and rejects the request (`400 Bad Request`) if it's larger.

If your source is 4K (3840×2160), downscale first via [`transcode`](/orchestration/reference/operations/InvokeTranscodeStepTemplate), then interpolate. Interpolation itself does not change resolution, so you can upscale afterwards if needed.

## Chaining: generate then smooth

The most common two-step flow — generate a short clip at the model's native FPS, then interpolate to a higher frame rate:

```json
{
  "steps": [
    {
      "$type": "videoGen",
      "name": "clip",
      "input": {
        "engine": "ltx2.3",
        "operation": "createVideo",
        "model": "22b-distilled",
        "prompt": "A calm mountain lake at dawn, slow cinematic pan",
        "duration": 5,
        "width": 1280,
        "height": 720,
        "fps": 24,
        "generateAudio": false,
        "guidanceScale": 4,
        "numInferenceSteps": 20
      }
    },
    {
      "$type": "videoInterpolation",
      "name": "clip-smooth",
      "input": {
        "video": { "$ref": "clip", "path": "output.video.url" },
        "interpolationFactor": 2
      }
    }
  ]
}
```

The `{ "$ref": "clip", "path": "output.video.url" }` reference creates a dependency — `clip-smooth` doesn't start until `clip` succeeds, and the interpolator's `video` field is filled in with the generated clip's signed URL at runtime. See [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) for the full reference syntax.

## Full polish pass: generate → interpolate → upscale

For the highest-quality short clips, chain all three steps. Order matters — **interpolation must happen before upscaling**, because VFIMamba's 2048 px input cap is tighter than the upscaler's 2560 px output cap. Generating at 1280×720, interpolating at that size (within the 2048 cap), then upscaling 2× to 2560×1440 (at the 2560 cap) satisfies both:

```json
{
  "steps": [
    {
      "$type": "videoGen",
      "name": "clip",
      "input": {
        "engine": "ltx2.3",
        "operation": "createVideo",
        "model": "22b-distilled",
        "prompt": "Neon-lit city street at night, slow dolly forward",
        "duration": 5,
        "width": 1280,
        "height": 720,
        "fps": 24,
        "generateAudio": false,
        "guidanceScale": 4,
        "numInferenceSteps": 20
      }
    },
    {
      "$type": "videoInterpolation",
      "name": "clip-smooth",
      "input": {
        "video": { "$ref": "clip", "path": "output.video.url" },
        "interpolationFactor": 2
      }
    },
    {
      "$type": "videoUpscaler",
      "name": "clip-polished",
      "input": {
        "video": { "$ref": "clip-smooth", "path": "output.video.url" },
        "scaleFactor": 2
      }
    }
  ]
}
```

What happens at runtime:

1. **`clip`** generates a 5-second 1280×720 clip at 24 FPS with LTX2.3 (`22b-distilled` for speed).
2. **`clip-smooth`** doubles the frame count → ~48 FPS, same 1280×720 resolution and duration — comfortably under VFIMamba's 2048 px cap.
3. **`clip-polished`** upscales 2× → 2560×1440, landing exactly at the [upscaler cap](./video-upscaler#picking-a-scale-factor).

Flipping the order (upscale then interpolate) would produce a 2560×1440 intermediate that VFIMamba *won't accept* — its 2048 px cap rejects it at submit time with a `400`.

Because the combined workflow is guaranteed to exceed the 100-second request limit, submit with `wait=0` and poll — the built-in **Try It** widget does this automatically.

## Reading the result

A successful `videoInterpolation` step emits a single video blob at the same resolution as the input:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoInterpolation",
    "status": "succeeded",
    "output": {
      "video": {
        "id": "blob_...",
        "url": "https://.../signed.mp4",
        "type": "video",
        "width": 1280,
        "height": 720
      }
    }
  }]
}
```

Note: `videoInterpolation` output is `video` (singular VideoBlob), not a collection. The reported `width` / `height` mirror the source — interpolation only changes frame count, not pixel dimensions.

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) to get a fresh URL.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

VFIMamba's cost scales with input pixel-frame volume, with a fixed overhead per call:

```
totalFrames        = durationSeconds × fps
pixelFrameProduct  = width × height × totalFrames / 1 000 000

total = C0 + C1 × pixelFrameProduct
        where (C0, C1) = (2.188, 0.29297)   if interpolationFactor == 2
              (C0, C1) = (0.324, 0.51379)   if interpolationFactor == 3
```

| Shape | Buzz |
|-------|------|
| 5 s @ 720p, 24 fps, `interpolationFactor: 2` | ~33 |
| 10 s @ 720p, 24 fps, `interpolationFactor: 2` | **~67** |
| 10 s @ 1080p, 30 fps, `interpolationFactor: 2` | ~180 |
| 10 s @ 720p, 24 fps, `interpolationFactor: 3` | ~114 |

`interpolationFactor: 3` roughly doubles the per-frame cost coefficient, so plan on ~1.75× the price over `2`. Resolution and duration scale linearly.

## Runtime

VFIMamba's runtime scales roughly linearly with **input-frame-count × resolution**. A 5-second 720p clip at 24 FPS (120 frames) at `interpolationFactor: 2` generates ~120 new frames and typically takes a couple of minutes end-to-end including queue time. `interpolationFactor: 3` does ~2× the work. Always submit with `wait=0` plus [webhooks or polling](/orchestration/guide/results-and-webhooks); a synchronous `wait=90` will time out on most realistic inputs.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "video could not be loaded" | URL not publicly reachable | Make sure the URL is fetchable without auth; avoid signed URLs that expire quickly. |
| `400` with "Video resolution (…) exceeds maximum supported resolution (2048x2048)" | Source is wider or taller than 2048 px | Downscale first via [`transcode`](/orchestration/reference/operations/InvokeTranscodeStepTemplate), then interpolate. |
| `400` with "Only 1 video stream is supported" | Multi-track source (e.g., camera with picture-in-picture) | Re-encode the source to a single video stream before submitting. |
| `400` with "interpolationFactor out of range" | Value outside `2`–`3` | Clamp client-side. VFIMamba only supports 2× or 3×. |
| `400` with "Unable to analyze video file" | Source couldn't be probed (corrupt, wrong container, network error during probe) | Check the URL resolves and serves valid MP4/WebM; re-upload if the source is corrupt. |
| Output has artifacts / ghosting on fast motion | `interpolationFactor: 3` too aggressive for high-motion content | Drop to `2`, or pre-stabilize the source. |
| Step `failed`, `reason = "blocked"` | Source video hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |
| Request timed out (`wait` expired) | VFIMamba too slow to finish in the synchronous window | Resubmit with `wait=0` and poll, or register a webhook. |

## Related

* [`InvokeVideoInterpolationStepTemplate`](/orchestration/reference/operations/InvokeVideoInterpolationStepTemplate) — the per-recipe endpoint
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoInterpolation/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining
* [Video upscaling](./video-upscaler) — the `videoUpscaler` recipe for increasing resolution
* [WAN video generation](./wan) — generate clips to feed into this recipe
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running workflows
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — how the `$ref` references work

---

---
url: /orchestration/recipes/video-upscaler.md
---

# Video upscaling

The `videoUpscaler` step type takes a video and returns a higher-resolution version using **[FlashVSR](https://huggingface.co/JunhaoZhuang/FlashVSR)** — a real-time video super-resolution model. One model, one knob: `scaleFactor` multiplies both dimensions by `2`, `3`, or `4`. A 720p input upscaled at `scaleFactor: 2` becomes 1440p; at `scaleFactor: 4` it becomes 2880p.

Common uses:

* Finishing step after video generation (chain `videoGen` → `videoUpscaler`)
* Rescuing low-resolution source clips
* Preparing clips for large-format display

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A source video URL — publicly fetchable by the orchestrator (Civitai CDN URLs work directly)

## The simplest request

Use the per-recipe endpoint when you're upscaling a single video and don't need webhooks or multi-step chaining:

```http
POST https://orchestration.civitai.com/v2/consumer/recipes/videoUpscaler?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "video": "https://.../input.mp4"
}
```

Defaults apply `scaleFactor: 2`. The response is a full [`Workflow`](/orchestration/reference/operations/GetWorkflow) whose single step carries the upscaled video blob.

::: tip Use `wait=0` for video
FlashVSR on a multi-second clip almost always exceeds the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline). Submit with `wait=0`, then poll or [subscribe via webhook](/orchestration/guide/results-and-webhooks).
:::

## Via the generic workflow endpoint

Equivalent request through [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — use this path when you need webhooks, tags, or to chain with other steps:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoUpscaler",
    "input": {
      "video": "https://.../input.mp4",
      "scaleFactor": 2
    }
  }]
}
```

## Input fields

See the [`VideoUpscalerInput` schema](/orchestration/reference/operations/InvokeVideoUpscalerStepTemplate) for the full definition.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `video` | ✅ | — | URL of the source video. Must be publicly fetchable without auth. |
| `scaleFactor` | | `2` | Integer `2`–`4`. Output dimensions are `input × scaleFactor` on both axes. |

### Picking a scale factor

FlashVSR applies a single pass at your chosen scale — there's no equivalent to image-upscaling's `numberOfRepeats`. Higher factors quadratically increase output pixels *and* runtime.

::: warning Output is capped at 2560 px per side
The orchestrator probes your source at submit time and **rejects the request** (`400 Bad Request`) if `width × scaleFactor` or `height × scaleFactor` would exceed **2560**. This keeps FlashVSR within a shape it can reliably deliver.
:::

Practical combinations that land inside the cap:

| Source | Max `scaleFactor` | Upscaled output |
|--------|-------------------|-----------------|
| 480p (854×480) | `2` | 1708×960 |
| 540p (960×540) | `2` | 1920×1080 |
| 720p (1280×720) | `2` | 2560×1440 *(exactly at cap)* |
| 640×360 | `4` | 2560×1440 *(exactly at cap)* |
| 1080p (1920×1080) | — | already too large; transcode down before upscaling |

Rule of thumb: start at `scaleFactor: 2` and only step up when the source is small enough that the output still fits under 2560 px. The visual gains between `2` and `4` are usually smaller than the runtime cost implies.

## Chaining: generate then upscale

The most common two-step video flow — generate at a manageable resolution, then upscale in a single submission:

```json
{
  "steps": [
    {
      "$type": "videoGen",
      "name": "clip",
      "input": {
        "engine": "ltx2.3",
        "operation": "createVideo",
        "model": "22b-distilled",
        "prompt": "A calm mountain lake at dawn, slow cinematic pan",
        "duration": 5,
        "width": 1280,
        "height": 720,
        "fps": 24,
        "generateAudio": false,
        "guidanceScale": 4,
        "numInferenceSteps": 20
      }
    },
    {
      "$type": "videoUpscaler",
      "name": "clip-hd",
      "input": {
        "video": { "$ref": "clip", "path": "output.video.url" },
        "scaleFactor": 2
      }
    }
  ]
}
```

The `{ "$ref": "clip", "path": "output.video.url" }` reference creates a dependency — `clip-hd` doesn't start until `clip` succeeds, and the upscaler's `video` field is filled in with the generated clip's signed URL at runtime. See [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) for the full reference syntax.

Because the combined workflow easily runs past the 100-second request limit, submit with `wait=0` and poll — the built-in **Try It** widget does this automatically.

## Reading the result

A successful `videoUpscaler` step emits a single upscaled video blob:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoUpscaler",
    "status": "succeeded",
    "output": {
      "video": {
        "id": "blob_...",
        "url": "https://.../signed.mp4",
        "type": "video",
        "width": 2560,
        "height": 1440
      }
    }
  }]
}
```

Note: `videoUpscaler` output is `video` (singular VideoBlob with `width` / `height`), not a collection — the step always returns exactly one clip.

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) to get a fresh URL.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

`videoUpscaler` uses an empirical polynomial fit to real FlashVSR runtimes — cost scales with both the input pixel-frame product and the `scaleFactor`:

```
Pin = totalFrames × (width × height)        // frames × pixels
D   = −16.35
    + −1.15e-6  × Pin
    +  0.4366   × scaleFactor
    + −1.44e-16 × Pin²
    +  1.08e-6  × (Pin × scaleFactor)
    +  2.73     × scaleFactor²

total = max(1, D)
```

Because the polynomial has negative low-order coefficients, **small inputs floor at ~1–2 Buzz** and larger inputs grow quadratically in `Pin` and `scaleFactor`. A couple of realistic shapes:

| Source | `scaleFactor` | Estimated Buzz |
|--------|---------------|----------------|
| 5 s @ 720p, 24 fps (~120 frames) | `2` | low tens of Buzz |
| 5 s @ 720p, 24 fps | `3` | ~2× the scale-2 cost |
| 10 s @ 1080p, 30 fps | `2` | low hundreds |
| 10 s @ 1080p, 30 fps | `4` | *rejected* ([see the 2560 px cap](#picking-a-scale-factor)) |

Always run `whatif=true` before a large upscale — the polynomial grows fast once you pass 1 megapixel × hundreds of frames, and stacking `scaleFactor: 3` or `4` compounds it.

## Runtime

FlashVSR is GPU-heavy and scales with both source resolution *and* duration. A 5-second 720p clip at `scaleFactor: 2` typically takes a few minutes end-to-end including queue time; `scaleFactor: 4` can easily be 10×+ that. Always submit with `wait=0` plus [webhooks or polling](/orchestration/guide/results-and-webhooks) — a synchronous `wait=90` will almost always time out on real inputs.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "video could not be loaded" | URL not publicly reachable | Make sure the URL is fetchable without auth; avoid signed URLs that expire quickly. |
| `400` with "Upscaled resolution (…) exceeds maximum supported resolution (2560x2560)" | `source_dim × scaleFactor` > 2560 on either axis | Lower `scaleFactor`, or transcode the source to a smaller resolution first (see [`transcode`](/orchestration/reference/operations/InvokeTranscodeStepTemplate)) before upscaling. |
| `400` with "scaleFactor out of range" | Value outside `2`–`4` | Clamp client-side. FlashVSR doesn't support `1×` (identity) or >`4×`. |
| `400` with "Unable to analyze video file" | Source couldn't be probed (corrupt, wrong container, network error during probe) | Check the URL resolves and serves valid MP4/WebM; re-upload if the source is corrupt. |
| Step `failed`, step-level `reason` mentions unsupported codec | Unusual container or codec in source | Transcode the source to H.264 MP4 first (see the [`transcode` recipe](/orchestration/reference/operations/InvokeTranscodeStepTemplate)), then upscale. |
| Step `failed`, `reason = "blocked"` | Source video hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |
| Request timed out (`wait` expired) | FlashVSR too slow to finish in the synchronous window | Resubmit with `wait=0` and poll, or register a webhook. |

## Related

* [`InvokeVideoUpscalerStepTemplate`](/orchestration/reference/operations/InvokeVideoUpscalerStepTemplate) — the per-recipe endpoint
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoUpscaler/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining
* [Image upscaling](./image-upscaler) — the `imageUpscaler` equivalent for images
* [WAN video generation](./wan) — generate clips to feed into this recipe
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running workflows
* [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — how the `$ref` references work

---

---
url: /orchestration/recipes/vidu.md
---

# Vidu video generation

Vidu's video-generation models are available in two engines:

| `engine` | Notes |
|----------|-------|
| `vidu` | Vidu 2.0 (`default` / `q1` models). Flat 600 Buzz. Text-to-video, image-to-video, first-last-frame interpolation, anime style. |
| `vidu-q3` | Vidu Q3. Per-second pricing, 4 resolution tiers, turbo mode, native audio, first-last-frame support. |

**Default choice for new integrations**: `engine: "vidu-q3"` for its per-second pricing and output quality. Use `engine: "vidu"` for simple text-to-video or anime-style clips where the flat cost model is predictable.

All Vidu jobs exceed the [100-second timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — always submit with `wait=0`.

## Vidu (`engine: "vidu"`)

### Text-to-video

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "vidu",
      "prompt": "A cat sitting on a windowsill watching rain fall outside",
      "duration": 4,
      "aspectRatio": "16:9",
      "style": "General"
    }
  }]
}
```

### Image-to-video

Pass one image in `images[]` to animate it. The first image is the start frame; the second (optional) is the end frame:

```json
{
  "engine": "vidu",
  "prompt": "The subject looks up and smiles warmly",
  "images": ["https://image.civitai.com/.../photo.jpeg"],
  "duration": 4,
  "aspectRatio": "16:9"
}
```

### First-last-frame interpolation

Pass two images to interpolate between a start and end frame:

```json
{
  "engine": "vidu",
  "prompt": "Smooth transition from morning to evening",
  "images": [
    "https://example.com/start.jpeg",
    "https://example.com/end.jpeg"
  ],
  "duration": 4
}
```

### Anime style

```json
{
  "engine": "vidu",
  "prompt": "Cherry blossoms falling gently in the breeze",
  "duration": 4,
  "style": "Anime"
}
```

### Parameters

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"vidu"` |
| `prompt` | — ✅ | Generation prompt. |
| `model` | `"default"` | `"default"`, `"q1"`. `"q3"` is the separate `vidu-q3` engine. |
| `duration` | `4` | `4` or `8` seconds. |
| `aspectRatio` | `null` | `"16:9"`, `"9:16"`, `"1:1"`. Inferred from image if omitted. |
| `style` | `"General"` | `"General"` or `"Anime"`. |
| `images[]` | `[]` | Up to 2 images (start frame / end frame). |
| `movementAmplitude` | `null` | `"auto"`, `"small"`, `"medium"`, `"large"`. |
| `enableBackgroundMusic` | `false` | Add background music to the output. |
| `enablePromptEnhancer` | `true` | LLM expands the prompt before generation. |

### Cost

Flat **600 Buzz** per clip, regardless of duration, style, or model.

***

## Vidu Q3 (`engine: "vidu-q3"`)

Vidu Q3 offers finer resolution control, a turbo speed tier, native audio generation, and per-second pricing.

### Text-to-video

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "vidu-q3",
      "prompt": "An eagle soaring over snow-capped mountain peaks at golden hour",
      "duration": 5,
      "resolution": "720p",
      "aspectRatio": "16:9"
    }
  }]
}
```

### Turbo mode

Turbo roughly halves cost and runtime with modest quality reduction:

```json
{
  "engine": "vidu-q3",
  "prompt": "A city street at night with neon lights",
  "duration": 5,
  "resolution": "720p",
  "turbo": true,
  "enableAudio": false
}
```

### First-last-frame interpolation

Pass up to 2 images — the first is the start frame, the second is the end frame:

```json
{
  "engine": "vidu-q3",
  "prompt": "Smooth transition from a rainy day to sunshine",
  "images": [
    "https://example.com/rainy.jpeg",
    "https://example.com/sunny.jpeg"
  ],
  "duration": 5,
  "resolution": "720p"
}
```

::: warning Two-image maximum
Vidu Q3 accepts at most 2 images (start + end frame). Sending more returns a `400`.
:::

### Parameters

| Field | Default | Notes |
|-------|---------|-------|
| `engine` | — ✅ | `"vidu-q3"` |
| `prompt` | — ✅ | Generation prompt. |
| `duration` | `5` | 1–16 seconds. |
| `resolution` | `"720p"` | `"360p"`, `"540p"`, `"720p"`, `"1080p"` |
| `turbo` | `false` | Faster, cheaper generation with modest quality trade-off. |
| `enableAudio` | `true` | Generate synchronized audio in the output. |
| `aspectRatio` | `null` | `"16:9"`, `"9:16"`, `"1:1"`, `"4:3"`, `"3:4"`. Inferred from images if omitted. |
| `images[]` | `[]` | 0–2 images (start + optional end frame). |

### Cost

Per-second pricing. `total = costPerSecond × duration`.

| Turbo | Resolution | Buzz/s | Example — 5 s |
|-------|------------|--------|---------------|
| No | `360p` / `540p` | 91 | **455** |
| No | `720p` / `1080p` | 200 | **1 000** |
| Yes | `360p` / `540p` | 46 | **230** |
| Yes | `720p` / `1080p` | 100 | **500** |

***

## Reading the result

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Long-running jobs

Vidu jobs typically take 1–4 minutes depending on duration and resolution. Use `wait=0` + polling or webhooks:

* **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` every 10–30 s

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "images maxItems" | More than 2 images on `vidu-q3` | Trim to at most 2 (start + end frame). |
| `400` with "duration must be one of" | Sent `2` or `6` for `vidu` | `vidu` accepts only `4` or `8`. |
| No audio in output | `enableAudio: false` on `vidu-q3` | Set `enableAudio: true` (the default). |
| Step `failed`, `reason = "no_provider_available"` | No Vidu worker available | Retry shortly. |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling
* [WAN video generation](./wan) — comparable alternative
* [Kling video generation](./kling) — another commercial video model

---

---
url: /orchestration/recipes/wan-image.md
---

# WAN image generation

WAN is Alibaba's open video model family — and the same architecture generates images. The orchestrator exposes WAN's image-gen path via `engine: "wan"`, with the model version picked by the `version` field.

For video workloads, see [WAN video generation](./wan) — shares the engine, different operations.

| `version` | Notes |
|-----------|-------|
| `v2.2` | **Default** — stable fal.ai-hosted path. Exposes `steps` (default 27) and `acceleration` tier. Supports LoRAs. |
| `v2.2-5b` | 5B-parameter variant of v2.2 — lighter, exposes a `shift` parameter in addition to the base knobs. Default `steps: 40`. |
| `v2.5` | Newer v2.5 release on fal. Simpler knob set than v2.2. |
| `v2.7` | Latest release on fal. Simpler knob set than v2.2. |

**Default choice for new integrations**: `version: "v2.2"`, `provider: "fal"`. Step up to `v2.5` or `v2.7` when you want the newer output, drop to `v2.2-5b` for lower-cost generation with the `shift` control.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* No checkpoint URN — the ecosystem ships its own models per version

## v2.2 (default)

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "wan",
      "version": "v2.2",
      "provider": "fal",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "imageSize": "square_hd",
      "guidanceScale": 3.5,
      "steps": 27,
      "quantity": 1
    }
  }]
}
```

### Parameters

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `version` | `v2.2` | `v2.2` / `v2.2-5b` / `v2.5` / `v2.7` | Required — picks the model variant. |
| `provider` | `fal` | `fal` | FAL is currently the only provider for WAN image gen. |
| `prompt` | — ✅ | ≥ 1 char | Natural-language works best. |
| `negativePrompt` | *(none)* | string | Optional. |
| `imageSize` | `square_hd` | `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9` (FAL-style enum) | Enum, not width/height. |
| `guidanceScale` | `3.5` | `1`–`10` | |
| `steps` | `27` | `2`–`40` | Only on `v2.2`. |
| `quantity` | `1` | `1`–`10` | |
| `seed` | random | int32 | |
| `enablePromptExpansion` | `false` | boolean | Model-side prompt expansion. |
| `enableSafetyChecker` | `false` | boolean | |
| `loras[]` | `[]` | array of `{ air, strength }` | LoRA support via the `ImageGenInputLora` shape — `{ "air": "urn:air:…", "strength": 1.0 }`. |

### With acceleration (`v2.2` only)

`v2.2` exposes an `acceleration` tier that trades a small quality hit for substantial speedups. Three levels — use `fast` for a good balance, `faster` when throughput matters more than fidelity:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "wan",
      "version": "v2.2",
      "provider": "fal",
      "prompt": "A cozy cabin in the woods at sunset",
      "imageSize": "square_hd",
      "acceleration": "faster",
      "guidanceScale": 3.5,
      "steps": 27
    }
  }]
}
```

`acceleration` accepts `none` (default) / `fast` / `faster`.

## v2.2-5b (lightweight with shift control)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "wan",
      "version": "v2.2-5b",
      "provider": "fal",
      "prompt": "A serene mountain landscape with a crystal clear lake at dawn",
      "imageSize": "landscape_16_9",
      "guidanceScale": 3.5,
      "steps": 40,
      "shift": 2
    }
  }]
}
```

Additional knob over `v2.2`:

| Field | Default | Range | Notes |
|-------|---------|-------|-------|
| `shift` | `2` | `1`–`10` | Controls the WAN "shift" parameter — sampling shift factor. `2` is the tuned default; bumping higher produces smoother but sometimes softer output. |

Default `steps` is `40` (up from `27` on `v2.2`); max is `50`.

## v2.5 and v2.7 (newer releases, simpler knobs)

Both expose the base shared surface (`prompt`, `negativePrompt`, `imageSize`, `guidanceScale`, `quantity`, `seed`, `loras`, prompt-expansion / safety-checker toggles) without exposing `steps`, `acceleration`, or `shift`:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "wan",
      "version": "v2.5",
      "provider": "fal",
      "prompt": "A cinematic sci-fi cityscape at sunset, neon lighting",
      "imageSize": "landscape_16_9",
      "guidanceScale": 3.5
    }
  }]
}
```

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "wan",
      "version": "v2.7",
      "provider": "fal",
      "prompt": "An epic fantasy dragon perched on a mountain peak at dawn",
      "imageSize": "landscape_16_9",
      "guidanceScale": 3.5
    }
  }]
}
```

Pick `v2.7` for the latest, `v2.5` if you've validated it against your workload and want to pin to it.

## Reading the result

All WAN versions emit the standard `imageGen` output:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.png" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

FAL queue is the dominant factor. Typical wall times for `quantity: 1`:

| Version | Wall time (no acceleration) | With `acceleration: fast` / `faster` |
|---------|----------------------------|--------------------------------------|
| `v2.2` | 15–40 s | 7–15 s |
| `v2.2-5b` | 10–25 s | (no acceleration) |
| `v2.5` | 15–40 s | (no acceleration) |
| `v2.7` | 15–40 s | (no acceleration) |

Submit `wait=0` + poll for large `quantity` or busy FAL periods.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Flat per-image pricing by `version`, with LoRA usage doubling the base on `v2.2`:

```
total = base × quantity
```

| Version | Base (per image) | Notes |
|---------|------------------|-------|
| `v2.2` | **150** (no LoRA) / **300** (with LoRA) | LoRA-enabled endpoint roughly 2× the price. |
| `v2.2-5b` | **~100** | Lighter variant, lower cost. |
| `v2.5` | **~32.5** | Cheapest of the WAN image tiers. |
| `v2.7` | **~39** standard / **~97.5** pro | |

Examples:

* `v2.2`, `quantity: 1`, no LoRA → **~150 Buzz**
* `v2.2`, `quantity: 2`, with `loras: [{…}]` → ~600 Buzz
* `v2.5`, `quantity: 4` → ~130 Buzz
* `v2.7`, `quantity: 1` → **~39 Buzz**

Dimensions (`imageSize` enum), `steps`, and `acceleration` don't change the Buzz price — they affect runtime but the provider charges flat per-image per version.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "version must be one of" | Typo or using a WAN video version number | Use `v2.2`, `v2.2-5b`, `v2.5`, or `v2.7`. |
| `400` with "provider must be fal" | Other providers aren't exposed yet | Stick with `fal`. |
| `400` with "acceleration is not a valid property" | Only `v2.2` exposes `acceleration` | Remove the field on v2.5/v2.7/v2.2-5b. |
| `400` with "shift is not a valid property" | Only `v2.2-5b` exposes `shift` | Remove the field on other versions. |
| `400` with "imageSize must be one of" | Sent width/height like other ecosystems | WAN uses FAL's enum — pick `square_hd`, `landscape_16_9`, etc. Use a different engine (Flux 2, Qwen, etc.) if you need arbitrary dimensions. |
| LoRA has no effect | Wrong AIR URN, or incompatible ecosystem | WAN LoRAs must be tagged for the WAN ecosystem and compatible with the version you're running. |
| Request timed out (`wait` expired) | Large `quantity` or busy FAL queue | Resubmit with `wait=0` and poll. |
| Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [WAN video generation](./wan) — WAN for videoGen (same engine, different operation)
* [Flux 2](./flux2) / [Qwen](./qwen) / [SDXL](./sdxl) — open-weights / sdcpp alternatives with width/height control
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* Full parameter catalog: the `Wan22FalImageGenInput`, `Wan225bFalImageGenInput`, `Wan25FalImageGenInput`, `Wan27FalImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface

---

---
url: /orchestration/recipes/wan.md
---

# WAN video generation

WAN is Alibaba's open video-generation model family. The orchestrator exposes every shipped version, across multiple providers, under a single `videoGen` step. This recipe walks through the full surface: which version to pick, which provider to route to, and how to invoke each operation.

## Versions at a glance

| `version` | Providers | Operations | Notes |
|-----------|-----------|------------|-------|
| `v2.7` | `fal` | `text-to-video`, `image-to-video`, `reference-to-video`, `edit-video` | Current flagship on FAL. Adds `edit-video`. |
| `v2.6` | `fal` | `text-to-video`, `image-to-video`, `reference-to-video` | FAL production default for new integrations. |
| `v2.5` | `fal` | `text-to-video`, `image-to-video` | Still supported; fewer operations than 2.6/2.7. |
| `v2.2` | `fal`, `comfy` | `text-to-video`, `image-to-video` | Only version with a native ComfyUI path. Supports LoRAs + Turbo mode. |
| `v2.1` | `fal`, `civitai` | `text-to-video`, `image-to-video` | Legacy — prefer 2.6+ unless you specifically need Civitai-hosted inference. |

**Default choice for new integrations**: `version: "v2.6"`, `provider: "fal"`.

## The request shape

Every WAN request is a single `videoGen` step on [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). Four keys select which WAN variant runs:

```json
{
  "$type": "videoGen",
  "input": {
    "engine":    "wan",
    "version":   "v2.6",         // 2.1 | 2.2 | 2.5 | 2.6 | 2.7
    "provider":  "fal",          // fal | comfy | civitai (version-dependent)
    "operation": "text-to-video" // see table above
  }
}
```

The orchestrator dispatches to the matching input schema (`Wan26FalTextToVideoInput`, `Wan22ComfyVideoGenInput`, etc.), so only the fields valid for that combination are accepted — [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) will `400` on unknown ones.

## Operations

All examples target production and use `<your-token>` in place of your Bearer token. Request timeout is **100 s** — `wait` is capped accordingly. See [Results & webhooks](/orchestration/guide/results-and-webhooks) for anything longer.

### text-to-video

Prompt → video. The most common operation; supported on every WAN version.

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?whatif=false&wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "videoGen",
    "input": {
      "engine": "wan",
      "version": "v2.6",
      "provider": "fal",
      "operation": "text-to-video",
      "prompt": "A serene forest with sunlight filtering through the trees, cinematic quality",
      "resolution": "1080p",
      "aspectRatio": "16:9",
      "duration": 5,
      "enablePromptExpansion": true
    }
  }]
}
```

### image-to-video

One or more source images animate into a clip. Supported on every version.

```json
{
  "engine": "wan",
  "version": "v2.6",
  "provider": "fal",
  "operation": "image-to-video",
  "images": [
    "https://image.civitai.com/.../19325406.jpeg"
  ],
  "prompt": "A dancing cat moving gracefully",
  "resolution": "1080p",
  "duration": 5
}
```

**v2.7 image-to-video uses `startImage` + `endImage`** (not `images[]`). Pass `startImage` to seed the first frame and optionally `endImage` to constrain the last frame (useful for loops and transitions). The `images[]` array accepted by v2.6 is not available on v2.7.

### reference-to-video *(v2.6, v2.7)*

Pass one or more reference videos; refer to them from the prompt via `@Video1`, `@Video2`, `@Video3` to transfer subjects / motion / style.

```json
{
  "engine": "wan",
  "version": "v2.6",
  "provider": "fal",
  "operation": "reference-to-video",
  "referenceVideoUrls": [
    "https://example.com/reference.mp4"
  ],
  "prompt": "@Video1 is walking through a beautiful garden",
  "resolution": "1080p",
  "aspectRatio": "16:9",
  "duration": 5
}
```

::: warning Reference video URL
The example above uses `https://example.com/reference.mp4` as a placeholder — replace with a real publicly fetchable video URL before submitting.
:::

### edit-video *(v2.7 only)*

Input video + prompt → transformed video. Preserves timing; rewrites content.

```json
{
  "engine": "wan",
  "version": "v2.7",
  "provider": "fal",
  "operation": "edit-video",
  "videoUrl": "https://example.com/input.mp4",
  "prompt": "Transform the scene into a cyberpunk aesthetic with neon lighting",
  "resolution": "1080p",
  "audioSetting": "auto"
}
```

::: warning Source video URL
Replace `https://example.com/input.mp4` with a real publicly fetchable video URL before submitting.
:::

## Common parameters

These appear on most (version, operation) combinations; the schema for your chosen variant is the source of truth.

| Field | Typical values | Notes |
|-------|----------------|-------|
| `resolution` | `720p`, `1080p` | 1080p costs more and takes longer. |
| `aspectRatio` | `16:9`, `9:16`, `1:1` | Vertical for reels/shorts. |
| `duration` | `5`, `10` (seconds) | Longer clips push you past the 100 s `wait` cap — use webhooks. |
| `enablePromptExpansion` | `true` | `false` | Let the model expand short prompts. Disable for reproducibility. |
| `enableSafetyChecker` | `true` (default) | Disable only if you handle moderation yourself. |
| `audioUrl` / `audioSetting` | URL or `auto` | Attach background audio (2.6+) or drive audio inference (2.7 edit). |

## Provider-specific features

### FAL (all versions)

Hosted inference with low queue time. FAL is the production default. `enablePromptExpansion` and audio attachment only exist on FAL variants.

### Comfy (v2.2 only)

Runs on Civitai's ComfyUI workers. Two features aren't available on FAL:

* **LoRAs** via the `loras` array with AIR identifiers
  ```json
  "loras": [{ "air": "urn:air:lora:civitai:123456@789012", "strength": 0.8 }]
  ```
* **Turbo mode** (`useTurbo: true`) + frame-interpolator models (`interpolatorModel: "film"`) for faster runs at lower quality
* **Multi-step workflows** — chain `videoGen` → `videoInterpolation` → `videoUpscaler` in one `steps` array

### Civitai (v2.1 only)

Legacy self-hosted path. Accepts explicit `model` AIRs and `width`/`height` instead of `resolution`/`aspectRatio`. Migrate to FAL 2.6+ unless you have a specific reason.

## Reading the result

On success each `videoGen` step emits a single `video` blob:

```json
{
  "id": "wf_01HXYZ...",
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "videoGen",
    "status": "succeeded",
    "output": {
      "video": { "id": "blob_...", "url": "https://.../signed.mp4" }
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL; don't cache them long-term. Download and store the bytes yourself if you need durable storage.

## Long-running jobs

WAN jobs routinely run longer than 100 s (any 1080p clip ≥ 10 s; reference-to-video; edit-video). The [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) means `wait` is capped — use `wait=90` for a best-effort inline attempt, then fall back to:

* **Webhooks** (preferred): register a callback with `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks)
* **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 5 s → 10 s → 30 s cadence

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with unknown field | Field isn't valid for this `(version, provider, operation)` combo | Check the specific `Wan<X><Provider><Op>Input` schema via [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). |
| Step `failed`, `error.code = "no_provider"` | No capacity for that resolution/duration on the chosen provider | Retry, drop to 720p, or switch provider. |
| `workflow:processing` after `wait=90` returns | Job ran past the 100 s timeout | Expected — continue via webhook or poll. |
| Blob URL `403` after a few minutes | Signed URL expired | Refetch the workflow to get a fresh URL. |
| Reference prompt ignored | `@VideoN` tokens missing or misnumbered | Tokens are 1-indexed and must match items in `referenceVideoUrls`. |

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

WAN video pricing varies by `version`, `provider`, `resolution`, and acceleration flags. All the numbers below are per **single video** (not per second per clip unless noted).

### v2.7 (FAL)

Flat per-second across resolutions:

```
total = 130 × duration_seconds
```

* 5 s → **~650 Buzz**
* 10 s → ~1 300 Buzz

### v2.6 (FAL)

Resolution-scaled per-second:

| Resolution | Buzz per second |
|-----------|-----------------|
| `720p` | **130** |
| `1080p` | **195** |

* 720p × 5 s → **~650 Buzz**
* 1080p × 5 s → **~975 Buzz**

### v2.5 (FAL)

Resolution-scaled per-second:

```
total = 100 × resolutionFactor × duration
```

with `resolutionFactor` = 1 (480p) / 2 (720p) / 3 (1080p).

* 720p × 5 s → **1 000 Buzz**
* 720p × 10 s → **2 000 Buzz**
* 480p × 5 s → **500 Buzz**

### v2.2 (FAL, `text-to-video` / `image-to-video`)

Driven by Turbo / LoRA / standard, plus resolution:

| Mode | `720p` | `580p` | `480p` |
|------|--------|--------|--------|
| Turbo | **~130 / video** (flat) | ~97.5 | ~65 |
| With LoRA | **~94.9 × duration** | ~94.9 × duration | ~94.9 × duration |
| Standard | **~104 × duration** | ~78 × duration | ~52 × duration |

Typical: 720p Turbo 5 s → ~130; 720p standard 5 s → ~520; 720p LoRA 5 s → ~475.

### v2.2-5b (FAL)

| Mode | Buzz per video |
|------|----------------|
| Fast-wan `720p` | **~32.5** |
| Distill | ~75.9 |
| Standard | ~142.35 |
| Image-to-video | ~142.35 (flat) |

### v2.2 (Comfy, Civitai-hosted)

Variable per-pixel-per-step formula with an 8× markup, minimum **100 Buzz**, rounded up to the nearest 25:

```
areaCost    = max(a × width × height + b, 0)    // per-frame per-step compute factor
duration    = length × steps × areaCost
buzz        = max(100, ceil((duration × 420/3600 × 8) / 25) × 25)
```

Where `(a, b)` is `(1.22e-6, -0.14)` for image-to-video, `(2.53e-7, -0.0259)` for text-to-video. This path is noticeably more expensive per second than the FAL routes — FAL is the default for a reason.

### v2.1 (legacy)

```
total = 100 × resolutionFactor × duration
```

with `resolutionFactor` = 1 (480p) / 2 (720p) / 3 (1080p). 720p × 5 s → ~1 000 Buzz.

### Quick reference

For new integrations on `v2.6` / `v2.7` at 720p × 5 s with no LoRAs, expect **~650–975 Buzz per video**. Always `whatif=true` before long-duration / high-res submissions — costs scale linearly with duration and can escalate fast.

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Results & webhooks](/orchestration/guide/results-and-webhooks) — production-ready result handling
* Full parameter catalog: the `Wan<version><Provider><Operation>Input` schemas in the [API reference](/orchestration/reference/) (e.g. `Wan26FalTextToVideoInput`, `Wan27FalEditVideoInput`)
* [`videoGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `videoGen` surface (WAN, LTX2, Flux, etc.); import into Postman / OpenAPI Generator

---

---
url: /orchestration/recipes/training-wan.md
---

# Wan video LoRA training

::: warning Preview ecosystem
Wan video training is currently marked **Preview** in the orchestrator. The endpoint accepts requests and `whatif=true` cost previews work, but actual training runs may not be available on every worker fleet. Reach out via [Civitai Discord](https://civitai.com/discord) before integrating against production traffic.
:::

Train a [WAN](./wan) video LoRA on a small set of source video clips using AI Toolkit. Output is a video LoRA usable in WAN text-to-video and image-to-video generation.

| `modelVariant` | Wan family | Buzz / epoch |
|----------------|-----------|--------------|
| `2.1` | Wan 2.1 (14B) | 12 |
| `2.2` | Wan 2.2 (14B-A14B) | 12 |

::: tip Long-running step
Video training is the slowest training mode on the platform — single-digit minutes per epoch on a 4-clip dataset. Always use `wait=0` and follow up via webhook or polling.
:::

## The request shape

```json
{
  "$type": "training",
  "input": {
    "engine":       "ai-toolkit",
    "ecosystem":    "wan",
    "modelVariant": "2.1"        // 2.1 | 2.2
  }
}
```

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* A training-data zip containing source video clips (each ≤ a few seconds, similar resolution)
* An accurate `count` of clips in the zip

## Wan 2.1 / 2.2

Both variants share the same input shape and per-epoch cost; pick the one that matches your inference target. The example below uses `2.1`; swap `modelVariant` to `"2.2"` for Wan 2.2 training (no other change required).

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "tags": ["training", "video"],
  "steps": [{
    "$type": "training",
    "priority": "normal",
    "retries": 2,
    "input": {
      "engine": "ai-toolkit",
      "ecosystem": "wan",
      "modelVariant": "2.1",
      "epochs": 2,
      "resolution": 512,
      "lr": 0.0002,
      "trainTextEncoder": false,
      "lrScheduler": "constant",
      "optimizerType": "adamw8bit",
      "networkDim": 32,
      "networkAlpha": 32,
      "trainingData": {
        "type": "zip",
        "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/5418/2202966TrainingData.Kjwp.zip",
        "count": 4
      },
      "samples": {
        "prompts": ["a video of TOK", "TOK moving in a garden"]
      }
    }
  }]
}
```

## Common parameters {#common-parameters}

Defaults shown are the post-`ApplyDefaults` values for Wan.

| Field | Required | Default | Notes |
|-------|----------|---------|-------|
| `engine` | ✅ | — | Always `ai-toolkit`. |
| `ecosystem` | ✅ | — | Always `wan` for this page. |
| `modelVariant` | ✅ | — | `2.1` or `2.2`. |
| `epochs` | | `5` | `1`–`20`. Billed per epoch. Keep low (2–5) for video — the per-epoch step count is much higher than image. |
| `numberOfRepeats` | | (no auto-default for Wan) | `1`–`5000`. |
| `lr` | | `0.0001` | `0.0002` is a typical override for video; see example. |
| `trainTextEncoder` | | `false` | Leave off — Wan training does not benefit from text-encoder updates. |
| `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. |
| `optimizerType` | | `adamw8bit` | See SDXL/SD1 page for full enum. |
| `networkDim` | | `32` | `1`–`256`. |
| `networkAlpha` | | matches `networkDim` | `1`–`256`. |
| `noiseOffset` | | `0` | `0`–`1`. |
| `flipAugmentation` | | `false` | Random horizontal flips. |
| `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. |
| `triggerWord` | | *(none)* | Activation token. Per the source, not all video ecosystems support `triggerWord` — leave empty if you see schema rejections. |
| `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. Zip should contain video clips. |
| `samples.prompts[]` | | `[]` | Per-epoch preview videos rendered with the trained LoRA. |
| `samples.negativePrompt` | | *(none)* | — |

## Reading the result

Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a video LoRA `.safetensors` blob plus any sample `.mp4` files. The trained LoRA is usable in [WAN video generation](./wan) by referencing it in the `loras` field.

## Runtime

Per-epoch wall time, default settings on a 4-clip dataset:

| Variant | Per-epoch | Typical full run |
|---------|-----------|-------------------|
| `2.1` | ~3–10 min | 6–20 min for 2 epochs |
| `2.2` | ~3–10 min | 6–20 min for 2 epochs |

Always use `wait=0`.

## Cost

```
total = 12 × epochs   (Buzz, base cost)
```

Cost-per-epoch is `12` per the orchestrator source. Sample-prompt rendering uses Wan video-generation rates (much higher than image samples) and is billed separately. Run with `whatif=true` to see the exact pre-flight charge.

| Configuration | Buzz (training only) |
|---------------|---------------------|
| `epochs: 2` | 24 + samples |
| `epochs: 5` | 60 + samples |

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with "modelVariant required" | Missing `modelVariant` | Set to `"2.1"` or `"2.2"`. |
| Step starts then fails immediately | Preview ecosystem not yet enabled on the routing GPU fleet | Contact Civitai support — Wan training is rolling out. |
| Step `failed` with VRAM-related error | Resolution × clip length too high for the worker | Lower `resolution` (e.g. to `512`), shorten clips to ≤ 3 seconds. |
| Trained LoRA produces static / no motion | Too few epochs, too few / too short clips | Raise `epochs` to 3–5; ensure clips show the motion you want learned. |
| Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged clips. |

## Related

* [LTX2 video LoRA training](./training-ltx2) — Lightricks LTX video LoRA training (also video, less expensive previews on LTX2.3)
* [WAN video generation](./wan) — use a trained LoRA in WAN inference
* [Flux 2 Klein LoRA training](./training-flux2-klein) — image-side counterpart
* [Results & webhooks](/orchestration/guide/results-and-webhooks)
* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow)
* [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml)

---

---
url: /orchestration/guide/workflows.md
---

# Workflows

The orchestrator has three nested concepts. Most of what you do lives at the top two; the third is mostly useful for diagnostics.

* **Workflow** — what you submit. A container with metadata, tags, payment rules, and one or more steps. Identified by a `workflowId`.
* **Step** — a unit of work inside a workflow. Each step has a `$type` (the step type — `imageGen`, `videoGen`, `transcription`, …) and an `input`, and produces one or more outputs.
* **Job** — a unit of work the step needs done. A step may emit zero, one, or many jobs — that's up to the step. Providers race to claim each job (only compatible providers compete), and the winning provider executes it. You don't schedule jobs directly.

(*A **recipe** is a separate concept: a typed convenience endpoint at `/v2/consumer/recipes/{name}` that wraps a single-step workflow so callers who only need one step type can skip the polymorphic `$type`. See [Submitting Work → The per-recipe path](./submitting-work#the-per-recipe-path).*)

```mermaid
graph LR
    WF[Workflow<br/>wf_01HXYZ...] --> S1[Step 0<br/>$type: imageGen]
    WF --> S2[Step 1<br/>$type: videoGen]
    S1 --> J1[Job a<br/>claimed by FAL]
    S2 --> J2[Job b<br/>claimed by Civitai]
    S2 --> J3[Job c<br/>claimed by FAL]
```

## Workflows

A workflow is what you `POST` to [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). It lives from submission until it hits a terminal status, after which it remains queryable via [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) for audit and result retrieval.

Key fields on a returned [`Workflow`](/orchestration/reference/operations/GetWorkflow):

| Field | What it tells you |
|-------|-------------------|
| `id` | Opaque workflow ID (prefixed `wf_`). Use everywhere you reference this workflow. |
| `status` | Current lifecycle state — see [Status lifecycle](#status-lifecycle). |
| `steps` | Array of [`WorkflowStep`](/orchestration/reference/operations/GetWorkflow) objects — the actual work and results. |
| `createdAt` / `startedAt` / `completedAt` | Timing. `startedAt` is null until the orchestrator begins execution; `completedAt` is null until terminal. |
| `cost` / `transactions` | What was charged, broken down by Buzz currency — see [Submitting Work → Payments](./submitting-work#payments-buzz). |
| `tags` | Indexed strings you can filter by via [`QueryWorkflows`](/orchestration/reference/operations/QueryWorkflows). |
| `metadata` | Arbitrary JSON you attached at submission (or via [`UpdateWorkflow`](/orchestration/reference/operations/UpdateWorkflow)). Not indexed. |
| `nsfwLevel` | Content classification, computed from step outputs. |
| `callbacks` | Registered webhooks — see [Results & webhooks](./results-and-webhooks). |

Submitted-only fields (`upgradeMode`, `allowMatureContent`, `currencies`, `experimental`, `arguments`) come back on the response as well so you can see what policy the workflow ran under.

## Steps

Each entry in `steps` is a [`WorkflowStep`](/orchestration/reference/operations/GetWorkflow). Steps are how you express *what* to do; the `$type` discriminator picks the step type and the `input` schema.

```json
{
  "$type": "videoGen",
  "name": "clip",
  "input": { "engine": "wan", "version": "v2.6", /* ... */ },
  "priority": "normal",
  "timeout": "00:10:00",
  "retries": 2
}
```

| Field | Purpose |
|-------|---------|
| `$type` | The step type. Switches which `input` schema applies. See the [API reference](/orchestration/reference/) for the full set (`imageGen`, `videoGen`, `imageUpscaler`, `transcription`, `textToSpeech`, `chatCompletion`, `comfy`, …). |
| `name` | Optional step name. Needed if another step wants to refer to this one. Defaults to the array index (`"0"`, `"1"`, …). |
| `input` | Payload specific to the step type — the bulk of a real request. |
| `priority` | Scheduling hint (`low`, `normal`, `high`). Higher priority jumps the queue, subject to your tier. |
| `timeout` | ISO 8601 duration — maximum time this step may run before the orchestrator marks it `expired`. |
| `retries` | Maximum retry count before declaring the step `failed`. Each retry may claim a different provider. |

On a fetched workflow, each step also carries:

| Field | What it tells you |
|-------|-------------------|
| `status` | Per-step status. A workflow's overall status rolls up from its steps. |
| `jobs` | The concrete jobs the orchestrator ran for this step. |
| `estimatedProgressRate` | `0.0`–`1.0` estimate of how far along the step is — see [How `estimatedProgressRate` is calculated](#how-estimatedprogressrate-is-calculated) below. Null if the step hasn't started. |
| `startedAt` / `completedAt` | Step-level timing. |
| `metadata` | Step-scoped metadata you attached. |

You can amend a step before it starts running via [`UpdateWorkflowStep`](/orchestration/reference/operations/UpdateWorkflowStep) / [`PatchWorkflowStep`](/orchestration/reference/operations/PatchWorkflowStep) — useful for fixing an `input` mistake while the workflow is still `unassigned`.

## Jobs

A job is a single unit of work the step needs done. A step decides how many jobs to emit — **zero, one, or many** — based on its inputs (e.g. a batch-of-4 image generation emits four jobs; a single ChatCompletion emits one; a validation-only step may emit zero). Each emitted job is published to the provider pool; compatible providers race to claim it, and the winner runs it.

Most consumer code can ignore the jobs array and just look at step-level status and output — but jobs are where you see *why* something failed, how long it queued, and which provider actually ran it.

| Field | Purpose |
|-------|---------|
| `id` | Internal job ID (for support / logs). |
| `status` | Same enum as workflow/step. |
| `queuePosition` | A job is queued with every compatible provider simultaneously; this field reports the position in **one** of those queues — the provider most likely to claim the job next. A different provider may still win the race. |
| `cost` | A relative **complexity** score for this job — like a query-planner cost, with no absolute unit. A cost of `4` is roughly twice as complex as a cost of `2`. This is *not* what you get charged: Buzz charges live on the workflow's `transactions` / `cost` ([Payments](./submitting-work#payments-buzz)). |
| `estimatedProgressRate` | `0.0`–`1.0` progress estimate — see [How `estimatedProgressRate` is calculated](#how-estimatedprogressrate-is-calculated). |
| `reason` / `blockedReason` | On failure, why — surfaced into step status as well. See [Errors & retries → Step-level failures](./errors-and-retries#step-level-failures). |

Multiple jobs on a step can come from two distinct sources:

* **Fan-out**: the step legitimately needs multiple units of work (e.g. a batch generating N images emits N jobs). All of them must succeed for the step to succeed.
* **Retries**: if `retries > 0` and a job fails transiently, the orchestrator emits a replacement job. The step succeeds as soon as the retry succeeds.

You can tell these apart by looking at the jobs' timing and `reason` — retry jobs appear after a failed sibling, fan-out jobs run in parallel from the start.

### How `estimatedProgressRate` is calculated

`estimatedProgressRate` is derived, not reported. The orchestrator combines two numbers:

* **Job cost** — the relative complexity score on the job (see above).
* **Worker throughput** — the recorded cost-per-second the worker has been completing on recent jobs.

Progress is `elapsed × throughput / cost`, clamped to `0.0`–`1.0`. It's useful for driving progress bars and ETAs, but treat it as a hint: cost estimation isn't perfect and real-world throughput varies by job shape, so actual completion can arrive earlier or later than the estimate suggests.

## Dependencies & parallelism

By default the orchestrator runs as much as it can at the same time:

* **Steps run concurrently** unless one depends on another.
* **Jobs run concurrently** within a step unless one depends on another.
* A step's jobs don't start until the step's dependencies have resolved. A step may even emit **more jobs asynchronously while it's still in progress** — `jobs` is not a fixed set you can count at the start.

### What creates a dependency

A dependency is created when one step *consumes the output of another*. Anywhere a step's `input` accepts a value, you can substitute a reference object instead:

```json
{
  "$ref": "<source name>",
  "path": "<dotted path into the source>"
}
```

The orchestrator resolves the reference at runtime and wires up the edge automatically.

```json
{
  "steps": [
    {
      "$type": "imageGen",
      "name": "hero",
      "input": { "prompt": "a cat astronaut", "width": 1024, "height": 1024 }
    },
    {
      "$type": "imageUpscaler",
      "name": "hero-4k",
      "input": {
        "image": { "$ref": "hero", "path": "output.images[0].url" },
        "numberOfRepeats": 1
      }
    }
  ]
}
```

`hero-4k` now depends on `hero`; it stays `unassigned` until `hero` reaches `succeeded`, then the `image` field is populated with the resolved URL. If `hero` fails, `hero-4k` is canceled.

**Source names** in `$ref`:

* A step's explicit `name` — e.g. `"hero"` above.
* The implicit positional name `"$0"`, `"$1"`, … if the step didn't set one (`$0` = first step, `$1` = second, …).
* `"$arguments"` to read from workflow-level arguments (see below).
* A loop variable inside `repeat`-style steps (e.g. `"frame"` if the loop is bound `as: "frame"`).

The exact `path` available depends on the source step's output schema — see each step type's reference page for the shape.

### Workflow arguments

The workflow-level `arguments` field lets you template values that steps reference. Define the shape once, submit many instances with different argument values:

```json
{
  "arguments": {
    "prompt": "a cat astronaut",
    "seed": 42
  },
  "steps": [
    {
      "$type": "imageGen",
      "input": {
        "prompt": { "$ref": "$arguments", "path": "prompt" },
        "seed":   { "$ref": "$arguments", "path": "seed" }
      }
    }
  ]
}
```

This is what makes saved workflows useful — your UI or SDK can ship a stable workflow template and only change `arguments` per submission.

## Deduplication & caching

The orchestrator tries not to charge you twice for the same work. Two mechanisms help, both with caveats.

### Workflow deduplication

Submitting a workflow whose steps and arguments exactly match one you already submitted typically gets **deduplicated** — you'll receive a reference to the existing workflow instead of a new one spinning up. Useful for idempotent retries on your side.

### Result caching

Step outputs are cached when the step's inputs are deterministic enough to replay. On submission, if the orchestrator can resolve a step's output purely from cache, it serves the cached result and **doesn't charge you Buzz for that step**.

Cache entries have a finite lifetime — typically around 30 days, but that's not guaranteed. An eviction between submissions means the second run pays full price.

### Caveats

Both mechanisms have **undocumented exceptions** — certain step types, parameter combinations, or account states bypass dedup / cache. Don't build correctness guarantees on top of them; treat them purely as a cost optimization. If you need exactly-once semantics, track your own `workflowId`s on your side.

## Status lifecycle

Workflows, steps, and jobs share the same [`WorkflowStatus`](/orchestration/reference/operations/GetWorkflow) enum:

```
unassigned → preparing → scheduled → processing ──▶ succeeded
                                             │
                                             ├──▶ failed
                                             ├──▶ expired
                                             └──▶ canceled
```

* `unassigned` — submitted, not yet routed to a provider
* `preparing` — resolving resources, warming caches, validating inputs
* `scheduled` — claimed by a provider, waiting in its queue
* `processing` — actively running
* `succeeded` / `failed` / `expired` / `canceled` — **terminal**; status will not change again

Terminal states are documented and webhook delivery enforces the invariant — see [Results & webhooks → Delivery semantics](./results-and-webhooks#delivery-semantics). Workflow status rolls up from step status: all steps succeeded → workflow succeeded; any step failed / expired / canceled → workflow does the same.

## When to look at each level

| Question | Where to look |
|----------|--------------|
| Did my request finish? What's the output? | Workflow `status` + `steps[].output` |
| Which step in a chain broke? | Per-step `status` + `reason` |
| Why did it fail for all providers? | Last job's `reason` / `blockedReason` |
| How much did it cost, split by currency? | Workflow `cost` + `transactions` |
| Was it SFW or mature? | Workflow `nsfwLevel` |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — create
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — fetch
* [`QueryWorkflows`](/orchestration/reference/operations/QueryWorkflows) — list / filter
* [`UpdateWorkflow`](/orchestration/reference/operations/UpdateWorkflow) / [`PatchWorkflow`](/orchestration/reference/operations/PatchWorkflow) — amend
* [`DeleteWorkflow`](/orchestration/reference/operations/DeleteWorkflow) — cancel
* [Submitting Work](./submitting-work) — the body and query parameters you can pass at submission
* [Errors & retries](./errors-and-retries) — what step / job failures look like

---

---
url: /orchestration/recipes/zimage.md
---

# Z-Image generation

Z-Image is a lightweight text-to-image model family that runs on Civitai's sdcpp workers. Two variants on the same ecosystem:

| `model` | Typical use | Defaults |
|---------|-------------|----------|
| `turbo` | **Default** — distilled model, extremely fast and cheap, high enough quality for most workloads | `cfgScale: 1`, `steps: 9` |
| `base` | Upgrade tier — use when `turbo` isn't delivering enough fidelity for a specific prompt | `cfgScale: 4`, `steps: 20` |

Both share the same `engine: "sdcpp"`, `ecosystem: "zImage"` invocation — they differ in default sampler tuning and the expected usage pattern. Neither supports img2img or image editing; the only operation is `createImage`.

**Default choice**: `model: "turbo"` at `cfgScale: 1` / `steps: 9`. Switch to `base` (with `cfgScale: 4` / `steps: 20`) when you need more fidelity — better prompt adherence, cleaner detail, or working negative prompts.

## Prerequisites

* A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites))
* No checkpoint URN needed — the ecosystem ships its own models; you pick between `base` and `turbo` via the `model` field

## The request shape

Every Z-Image request is a single `imageGen` step routed through sdcpp:

```json
{
  "$type": "imageGen",
  "input": {
    "engine":    "sdcpp",
    "ecosystem": "zImage",
    "model":     "turbo",        // turbo | base
    "operation": "createImage"
  }
}
```

The orchestrator dispatches to the matching input schema (`ZImageTurboCreateImageGenInput` or `ZImageBaseCreateImageGenInput`), so only the fields valid for that combination are accepted — [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) will `400` on unknown ones.

## turbo (default)

Turbo is the distilled Z-Image variant — fast, cheap, and good enough for almost every workload. Low CFG (`cfgScale: 1`, effectively disabling classifier-free guidance) and short step counts make it the cost/quality sweet spot:

```http
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "zImage",
      "model": "turbo",
      "operation": "createImage",
      "prompt": "A cozy cabin in the woods at sunset, cinematic lighting",
      "width": 1024,
      "height": 1024,
      "cfgScale": 1,
      "steps": 9
    }
  }]
}
```

::: tip Turbo tuning
Keep `cfgScale` at `1` and `steps` at `8`–`12`. Pushing either up negates the turbo speedup without meaningfully improving quality — if you need better output, switch to `base` instead of cranking turbo's knobs.
:::

### Batching with turbo

Turbo's low cost makes it a natural fit for multi-image calls. `quantity` up to `12` is supported on the schema, though you'll generally hit the 100-second request timeout above ~4–6 depending on dimensions:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "zImage",
      "model": "turbo",
      "operation": "createImage",
      "prompt": "A majestic fox with flowing tails in an enchanted garden",
      "width": 1024,
      "height": 1024,
      "quantity": 4,
      "cfgScale": 1,
      "steps": 9
    }
  }]
}
```

For larger batches, submit with `wait=0` and poll (see [Results & webhooks](/orchestration/guide/results-and-webhooks)).

## base (fallback when turbo isn't enough)

Step up to `base` when turbo isn't delivering — prompts that need strong adherence, fine detail work, or negative-prompt conditioning. Higher `cfgScale` (`4` is the default) and more sampler steps (`20`+) at the cost of higher wall time and spend.

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "zImage",
      "model": "base",
      "operation": "createImage",
      "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
      "width": 1024,
      "height": 1024,
      "cfgScale": 4,
      "steps": 20
    }
  }]
}
```

### With a negative prompt

Z-Image Base honours negative prompts; they steer the model away from undesired content. (Turbo effectively ignores them at `cfgScale: 1`, so this is one of the cleanest reasons to step up.)

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "zImage",
      "model": "base",
      "operation": "createImage",
      "prompt": "A detailed anime character in a magical forest, ethereal lighting, masterpiece",
      "negativePrompt": "blurry, low quality, deformed hands, bad anatomy, watermark, text",
      "width": 1024,
      "height": 1024,
      "cfgScale": 4,
      "steps": 24
    }
  }]
}
```

### With LoRAs

Z-Image LoRAs are a map of AIR URN → strength — same shape as every other sdcpp ecosystem. LoRAs work on both `turbo` and `base`; this example is on `base` because LoRA-driven styles usually benefit from the higher-fidelity tier:

```json
{
  "steps": [{
    "$type": "imageGen",
    "input": {
      "engine": "sdcpp",
      "ecosystem": "zImage",
      "model": "base",
      "operation": "createImage",
      "prompt": "A cyberpunk street scene with neon signs and rain reflections",
      "width": 1024,
      "height": 1024,
      "cfgScale": 4,
      "steps": 20,
      "loras": {
        "urn:air:zImage:lora:civitai:123456@789012": 0.8
      }
    }
  }]
}
```

## Common parameters

Both `turbo` and `base` share the same schema — only defaults differ. See the [`ZImageTurboCreateImageGenInput`](/orchestration/reference/operations/InvokeImageGenStepTemplate) and [`ZImageBaseCreateImageGenInput`](/orchestration/reference/operations/InvokeImageGenStepTemplate) schemas for the complete field list.

| Field | Required | Turbo default | Base default | Range | Notes |
|-------|----------|---------------|--------------|-------|-------|
| `prompt` | ✅ | — | — | ≤ 10 000 chars | Natural-language descriptions with lighting / composition / camera cues. |
| `negativePrompt` | | *(none)* | *(none)* | ≤ 10 000 chars | Most useful on `base`; effectively ignored on `turbo` because `cfgScale: 1`. |
| `width` / `height` | | `1024` | `1024` | `64`–`2048` | Divisible by 16. |
| `cfgScale` | | `1` | `4` | `0`–`30` | Turbo: keep at `1`. Base: `3`–`5` is the sweet spot. |
| `steps` | | `9` | `20` | `1`–`150` | Turbo: `8`–`12`. Base: `20`–`30`. |
| `sampleMethod` | | `euler` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). |
| `schedule` | | `simple` | `simple` | enum | [`SdCppSchedule`](/orchestration/reference/). |
| `loras` | | `{}` | `{}` | `{ airUrn: strength }` | Stack multiple; strengths in `0.0`–`2.0` are typical. |
| `quantity` | | `1` | `1` | `1`–`12` | Number of images per call. |
| `seed` | | random | random | int64 | Pin for reproducibility. |

## Reading the result

A successful `imageGen` step emits an `images[]` array — one entry per `quantity`:

```json
{
  "status": "succeeded",
  "steps": [{
    "name": "0",
    "$type": "imageGen",
    "status": "succeeded",
    "output": {
      "images": [
        { "id": "blob_...", "url": "https://.../signed.jpeg" }
      ]
    }
  }]
}
```

Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL.

## Runtime

| Variant | Typical wall time per 1024×1024 image | `wait` recommendation |
|---------|---------------------------------------|-----------------------|
| `turbo` (`cfgScale: 1`, `steps: 9`) — **default** | 3–8 s | `wait=30` fine for `quantity ≤ 4` |
| `base` (`cfgScale: 4`, `steps: 20`) | 8–20 s | `wait=60` fine for `quantity ≤ 2` |

Turbo's cost advantage shows up most clearly in batch mode — `quantity: 4` on turbo often finishes in the same wall-clock window as `quantity: 1` on base. For larger batches or dimensions, submit with `wait=0` and poll.

## Cost

Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection.

Same per-pixel / per-step shape for both variants — different base cost and reference step count:

```
total = base × (width × height / 1024²) × (steps / referenceSteps) × quantity
```

| Variant | `base` | `referenceSteps` | Defaults (1024², default steps, `quantity: 1`) |
|---------|--------|------------------|-----------------------------------------------|
| `turbo` | `8` | `9` | **~8 Buzz** |
| `base` | `20` | `20` | **~20 Buzz** |

Examples:

* Turbo at `quantity: 4` → ~32 Buzz
* Turbo at 1536×1024, `steps: 12` → ~8 × 1.5 × 1.33 ≈ **~16 Buzz**
* Base at 1024², `steps: 30` → ~20 × 1 × 1.5 ≈ **~30 Buzz**

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `400` with unknown property | Field not valid for the ecosystem (e.g. `guidanceScale` — that's a Flux knob, not sdcpp) | Z-Image uses the sdcpp knob names: `cfgScale`, `steps`, `sampleMethod`, `schedule`. |
| `400` with "operation must be createImage" | Passed `editImage` or `createVariant` | Z-Image only supports `createImage` on either model. Use Flux 2 Klein or Flux 1 sdcpp if you need img2img / edit. |
| `400` with "ecosystem must be zImage" | Typo on the ecosystem slug | `"zImage"` — camelCase with capital I. Not `"z-image"`, `"zimage"`, `"Z-Image"`. |
| Turbo output looks washed out / low-detail | Step count too low for the prompt complexity | Bump `steps` to `10`–`12`; or switch to `base` if you need more. |
| Turbo ignores the negative prompt | `cfgScale: 1` effectively disables negative-prompt conditioning | Use `base` (with `cfgScale: 4`) if your workload depends on negative prompts. |
| Base output ignores the prompt | `cfgScale` too low or prompt too short | Raise `cfgScale` toward `5`; add lighting / composition cues. |
| LoRA silently has no effect | Wrong AIR URN, unpublished / private model, wrong ecosystem | Verify the URN on the LoRA's Civitai page; Z-Image LoRAs must be tagged for the `zImage` ecosystem. |
| Request timed out (`wait` expired) | Large `quantity`, large dimensions, or cold worker | Resubmit with `wait=0` and poll. Turbo is less likely to time out than base for the same dimensions. |
| Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). |

## Related

* [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here
* [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling
* [Flux 2 image generation](./flux2) — higher-fidelity alternative with createVariant / editImage support
* [Flux 1 image generation](./flux1) — other sdcpp-hosted Flux ecosystem
* [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output
* [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref`
* Full parameter catalog: the `ZImageBaseCreateImageGenInput` and `ZImageTurboCreateImageGenInput` schemas in the [API reference](/orchestration/reference/)
* [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator