--- url: /orchestration/recipes/ace-step-audio.md --- # ACE-Step music generation [ACE-Step 1.5](https://github.com/ace-step/ACE-Step) is an open text-to-music model that produces full songs from a style description plus structured lyrics. The orchestrator exposes it through a single `aceStepAudio` step, which runs on Civitai's ComfyUI workers. The default checkpoint is the 2B turbo model (`ace_step_1.5_turbo_aio.safetensors`) — an eight-step distillation that generates a 30-second song in ~10 s of worker time. Without a cover image the step emits an MP3 audio blob. Attach `cover.imageUrl` and the output is an MP4 video with that image as the still background, sized 512×512. ## Variants There's one step type and one invocation path; the only variant axis is the optional `diffusionModel` override, which swaps the underlying diffusion checkpoint. All values come from Comfy-Org's [`ace_step_1.5_ComfyUI_files`](https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files) HuggingFace bundle. The default (unset) is the 2B turbo all-in-one checkpoint. | `diffusionModel` | Variant | Params | `steps` | `cfg` | Best for | |---|---|---|---|---|---| | *(unset)* | `urn:air:ace:checkpoint:huggingface:Comfy-Org/ace_step_1.5_ComfyUI_files@main/checkpoints/ace_step_1.5_turbo_aio.safetensors` | 2B turbo (AIO) | `8` | `1.0` | **Default** — single all-in-one file; fastest path. | | 2B turbo | `urn:air:ace:checkpoint:civitai:2549270@2864880` | 2B | `8` | `1.0` | Split-file equivalent of the default AIO. Prefer the AIO unless you're already pulling split files. | | 2B base | `urn:air:ace:checkpoint:civitai:2549270@2864864` | 2B | `50` | `~4` | Non-turbo 2B base — higher fidelity than turbo at the cost of sampling time. | | XL turbo | `urn:air:ace:checkpoint:civitai:2549270@2864949` | 4B | `8` | `1.0` | More fidelity at turbo speed. Higher VRAM; slower first-submission while the worker pulls the split files. | | XL base | `urn:air:ace:checkpoint:civitai:2549270@2864892` | 4B | `50` | `~4` | Highest-fidelity base 4B. Non-turbo; typically slowest. | | XL SFT | `urn:air:ace:checkpoint:civitai:2549270@2864917` | 4B | `50` | `~4` | Supervised-fine-tuned 4B; sibling of XL base with the same runtime characteristics. | Turbo variants are distilled to converge in 8 steps with CFG effectively off (`1.0`). Non-turbo base / SFT variants expect the full 50-step schedule with classifier-free guidance on (around `4`) — submitting them with the default `steps: 8` / `cfg: 1.0` produces underbaked output. **Default choice for new integrations**: omit `diffusionModel` entirely. The 2B turbo AIO file is the default and is what Civitai's workers are consistently warm on. Reach for an XL split-file override only when the default fidelity isn't enough and you can tolerate a slow first-submission while the worker pulls the additional files. ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A `musicDescription` — a short, genre-prefixed style blurb (e.g. `"Neo-Soul: warm Rhodes, brush kit, introspective"`) * A `lyrics` string — structured with section markers (`[Verse]`, `[Chorus]`, `[Bridge]`, …). Use `""` for pure instrumentals (and set `vocalWeight: 0.0` / `instrumentalWeight: 1.0`) * A `seed` — any integer; same seed + same input reproduces the track deterministically ## Default (2B turbo, audio-only) The default path — no `diffusionModel` override, no cover. Output is an MP3 blob. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "aceStepAudio", "input": { "musicDescription": "Neo-Soul: A warm, organic neo-soul track with smooth Rhodes chords, mellow bass, and gentle drums. Soulful and introspective mood.", "lyrics": "[Verse 1]\nSunlight breaks through the morning haze\nCoffee steam rising, starting the day\n\n[Chorus]\nThis is the rhythm of my life\nSimple moments, pure delight", "duration": 30, "bpm": 95, "key": "D major", "language": "en", "seed": 12345 } }] } ``` ## Instrumental (no vocals) Drop vocals by pairing an empty `lyrics` string with `vocalWeight: 0.0` and `instrumentalWeight: 1.0`. The model still needs both fields — an empty `lyrics` with the default `vocalWeight` of 0.9 will produce scat-like placeholder vocals. ## Audio with cover image (MP4 output) Attach `cover.imageUrl` and the step emits a `video` blob (`.mp4`) with the image as a static 512×512 background instead of an MP3. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "aceStepAudio", "input": { "musicDescription": "Rock: A driving rock track with powerful guitars and thundering drums.", "lyrics": "[Intro]\n[Verse]\nBreaking through the walls tonight\nNothing is gonna stop this fight", "duration": 30, "bpm": 140, "key": "E minor", "seed": 42, "cover": { "imageUrl": "https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/07f78344-e165-4e96-8340-caf0e562f070/anim=false,width=450,optimized=true/1.jpeg" } } }] } ``` `cover.imageUrl` accepts either a plain URL string or a workflow `$ref` pointing at an earlier step's output (e.g. chain an `imageGen` step to generate the album art, then feed it into `aceStepAudio` — see [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism)). ## Switching the diffusion model Set `diffusionModel` to a full AIR URN. The 2B turbo AIO is the default; everything else is a drop-in override. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "aceStepAudio", "input": { "musicDescription": "Cinematic Orchestral: Sweeping strings, bold brass, and thundering percussion.", "lyrics": "", "duration": 30, "bpm": 110, "key": "D minor", "instrumentalWeight": 1.0, "vocalWeight": 0.0, "seed": 3, "diffusionModel": "urn:air:ace:checkpoint:civitai:2549270@2864949" } }] } ``` The split-file XL checkpoints require the worker to download them on first use, so a fresh submission can sit in `scheduled` for a minute or two before a worker is warm. Use the `wait=60` resume loop (see [Runtime](#runtime)) or webhooks — don't wait on a single `wait=60` POST for the first XL call. ## Parameters | Field | Required | Default | Notes | |---|---|---|---| | `musicDescription` | ✅ | — | Style / genre description. Prefix with a genre label (`"Neo-Soul:"`, `"Jazz:"`) for best results. | | `lyrics` | ✅ | — | Structured lyrics with `[Verse]`, `[Chorus]`, `[Bridge]` markers. Use `""` for pure instrumentals. | | `seed` | ✅ | — | Any `int32`. Same inputs + same seed reproduce the track. | | `duration` | | `60` | Seconds, range `1`–`190`. Longer durations increase Buzz linearly — see [Cost](#cost). | | `bpm` | | `120` | Beats per minute, range `40`–`200`. | | `timeSignature` | | `"4"` | Beats per measure. `"3"` / `"4"` / `"6"` common. | | `language` | | `"en"` | Language code — `en`, `zh`, `ja`, `ko`, … | | `key` | | `"C major"` | Musical key, e.g. `"E minor"`, `"Bb major"`. | | `instrumentalWeight` | | `0.85` | Range `0.0`–`1.0`. Raise toward `1.0` for instrumental-heavy mixes. | | `vocalWeight` | | `0.9` | Range `0.0`–`1.0`. Set to `0.0` when `lyrics` is empty or you want a pure instrumental. | | `diffusionModel` | | *(2B turbo AIO)* | Full AIR URN for the diffusion checkpoint. See the [Variants](#variants) table. | | `cover.imageUrl` | | *(none)* | URL (or workflow `$ref`) to a cover image. When set, output is an MP4 video with the image as the 512×512 background instead of an MP3. | ## Reading the result Audio-only runs emit a single `audio` blob (MP3): ```json { "status": "succeeded", "cost": { "total": 4 }, "steps": [{ "name": "$0", "$type": "aceStepAudio", "status": "succeeded", "output": { "blob": { "type": "audio", "id": "blob_....mp3", "available": true, "url": "https://orchestration-new.civitai.com/v2/consumer/blobs/blob_....mp3?sig=...&exp=...", "urlExpiresAt": "2027-04-14T15:13:40Z", "duration": 30, "jobId": "..." } }, "jobs": [{ "id": "...", "status": "succeeded", "startedAt": "2026-04-14T15:13:28.512Z", "completedAt": "2026-04-14T15:13:37.319Z", "cost": 4 }] }] } ``` Fields: * **`blob.type`** — `"audio"` for MP3 output (no cover), `"video"` when `cover.imageUrl` was supplied (MP4 output). * **`blob.id`** — stable blob key, ending in `.mp3` or `.mp4`. * **`blob.url`** — signed URL. Fetch within `urlExpiresAt` or refetch the workflow / call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. * **`blob.duration`** — on audio blobs only, the requested duration in seconds (echoes `input.duration`). Video blobs omit this and expose `width` / `height` (both 512) instead. * **`blob.available`** — `true` once the file is persisted. Whatif previews return `false` because no job actually ran. When `cover.imageUrl` is set, `blob` is a video blob — same shape, `type: "video"`, `.mp4` extension, `width: 512`, `height: 512`. Despite the C# source commenting "WebM", the current Civitai pipeline emits MP4. ## Runtime Measured end-to-end against `orchestration.civitai.com` on 2026-04-14: | Shape | POST → terminal | |---|---| | `duration: 30`, 2B turbo, no cover | ~15 s (job itself ~9 s) | | `duration: 60`, 2B turbo, no cover | ~15 s (job itself ~14 s) | | `duration: 30`, 2B turbo, with cover image | ~13 s (job itself ~7 s) | | `duration: 30`, XL turbo (4B) cold worker | >60 s (needs `wait=60` resume loop; worker had to pull split files) | The 2B turbo default beats the 60-s long-poll window comfortably for every duration up to the 190-s cap, so **submit with `wait=60` and expect the POST itself to return terminal state**. If it doesn't (cold XL variant, capacity pressure), the response comes back non-terminal at the 60-s ceiling — re-issue `GET /v2/consumer/workflows/{id}?wait=60` in a loop until the response is terminal. See [Results & webhooks](/orchestration/guide/results-and-webhooks) for the resume pattern. For backend integrations that can't hold a connection, register a webhook URL and submit with `wait=0` (fire-and-forget). ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Cost is driven purely by `duration` — a flat base charge plus a per-second factor. Nothing else in the input affects price (model variant, BPM, cover image, instrumental weights, lyrics length are all free). ``` total = 1 + duration × 0.1 ``` | Shape | Buzz | |---|---| | `duration: 10` (shortest useful clip) | 2 | | `duration: 30` (default recipe example) | **4** | | `duration: 60` (schema default) | 7 | | `duration: 90` (typical full song) | 10 | | `duration: 180` (near max, 3-minute track) | 19 | Arithmetic check against the formula: `1 + 30 × 0.1 = 4` ✅, `1 + 60 × 0.1 = 7` ✅, `1 + 180 × 0.1 = 19` ✅. Prod whatif previews confirmed these exact Buzz figures on 2026-04-14. The orchestrator surfaces the raw `Factors["total"]` value — non-integer formula outputs (e.g. `duration: 15` → `2.5`) are passed through unchanged in `cost.total`; there's no `Math.Ceiling` / `Math.Round` in the handler. Cover images, key, BPM, time signature, language, and instrumental / vocal weights don't affect Buzz price — ACE-Step bills flat-plus-per-second on duration only. ## Troubleshooting | Symptom | Likely cause | Fix | |---|---|---| | `400` with `"duration must be between 1 and 190"` (or similar range complaint) | `duration` outside `[1, 190]`, `bpm` outside `[40, 200]`, or a weight outside `[0.0, 1.0]` | Clamp the field to the range in the parameters table. | | `400` with `"musicDescription is required"` / `"lyrics is required"` / `"seed is required"` | Missing one of the three required fields. `lyrics: ""` is valid; the field itself must still be present. | Include every required field explicitly. | | `400` with `"Unable to analyze … file"` on the cover image | `cover.imageUrl` pointed at a host that rejected the orchestrator's fetch (range requests, UA block, ALB cookie gating) | Use a Civitai CDN URL, or generate the cover with an `imageGen` step and `$ref` its output. | | Output has scat-like placeholder vocals on an "instrumental" track | `lyrics: ""` but `vocalWeight` left at default `0.9` | Set `vocalWeight: 0.0` (and ideally `instrumentalWeight: 1.0`) whenever `lyrics` is empty. | | Step `failed`, `reason = "blocked"` | Content moderation on the description / lyrics / cover image | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | | Workflow stuck in `scheduled` for >60 s on an XL `diffusionModel` override | No warm worker has the split-file checkpoint yet; the first submission of a given XL variant triggers a download | Keep polling with `?wait=60`; subsequent submissions in the same hour land on the now-warm worker in ~15 s. | | Request timed out (`wait=60` returned non-terminal) | Cold XL variant, capacity pressure, or `duration` near 190 s on a busy shard | Re-issue `GET /v2/consumer/workflows/{id}?wait=60` until the response is terminal. | ## Related * [`InvokeAceStepAudioStepTemplate`](/orchestration/reference/operations/InvokeAceStepAudioStepTemplate) — the per-recipe endpoint * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining `aceStepAudio` into multi-step workflows * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for the `wait=60` resume loop * [Transcription](./transcription) — inverse direction (audio → text); chain after `aceStepAudio` to auto-caption a track * [Text-to-speech](./text-to-speech) — sibling audio recipe for spoken output * [Flux 2 image generation](./flux2) — common upstream for generating cover art to feed into `cover.imageUrl` * [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — for chaining an `imageGen` cover generator into this step * [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running submissions (cold XL variants, webhooks) * Full parameter catalog: the `AceStepAudioInput` schema in the [API reference](/orchestration/reference/) * [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/aceStepAudio/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint --- --- url: /site/guide/air.md description: >- The AI Resource Identifier (AIR) URN format used across Civitai and the Orchestration API. --- # AIR identifiers An **AI Resource Identifier** (AIR) is the canonical URN-style string Civitai uses to reference any AI resource — a checkpoint, LoRA, VAE, embedding, or upscaler — consistently across the site API, the Orchestration API, and partner integrations. Every response from [`GET /model-versions/{id}`](../reference/model-versions#get-a-model-version) includes an `air` field you can pass directly to generation APIs. ## Format ``` urn:air:{ecosystem}:{type}:{source}:{id}[@{version}][+{fileId}][.{format}] ``` The `urn:` and `air:` prefixes are both optional — parsers accept `urn:air:sdxl:checkpoint:civitai:827184@2514310`, `air:sdxl:checkpoint:civitai:827184@2514310`, and bare `sdxl:checkpoint:civitai:827184@2514310` interchangeably. **Use the full `urn:air:...` form** in API requests; it's the unambiguous canonical form. ### Fields | Field | Required | Description | |-------|----------|-------------| | `ecosystem` | Optional | Model family bucket: `sd15`, `sdxl`, `sd3`, `flux1`, `other`, etc. | | `type` | Optional | Resource kind: `checkpoint`, `lora`, `embedding`, `vae`, `controlnet`, `upscaler`. | | `source` | Required | Hosting system: `civitai`, `civitai-r2`, `huggingface`, `orchestrator`. | | `id` | Required | Resource identifier within the source. For `civitai`, this is the **model ID**. | | `version` | Optional | Specific version (for `civitai` this is the model version ID). If omitted, the resource's default/latest version is implied. | | `fileId` | Optional | Specific `ModelFile` id, prefixed with `+`. Disambiguates between multiple files attached to the same version (e.g. a pruned vs. full-weight checkpoint, or a base model shipped alongside its text-encoder file). Omit to let the resolver pick the primary file. | | `format` | Optional | Model file format, e.g. `safetensor`, `ckpt`, `diffuser`. | ## Real examples From actual `GET /api/v1/model-versions/{id}` responses and internal workflow templates: ``` urn:air:sdxl:checkpoint:civitai:827184@2514310 urn:air:sdxl:checkpoint:civitai:827184@2514310+2402203 urn:air:illustrious:checkpoint:civitai:795765@900661 urn:air:other:upscaler:civitai:147759@164821 urn:air:other:other:civitai-r2:civitai-worker-assets@sam_vit_b_01ec64.pth ``` The second example pins the AIR to a specific file on the version (e.g. `waiIllustriousSDXL_v160.safetensors`, file id `2402203`) — useful when a version ships multiple downloadable artifacts and you need to be explicit about which one to load. The last one is a file asset (SAM ViT-B checkpoint) stored on Civitai's R2 bucket rather than a model version. ## Type values The `type` segment maps to Civitai's `ModelType` enum: | AIR type | Civitai `ModelType` | |----------|---------------------| | `checkpoint` | `Checkpoint` | | `lora` | `LORA` | | `embedding` | `TextualInversion` | | `vae` | `VAE` | | `controlnet` | `Controlnet` | | `upscaler` | `Upscaler` | Resources that don't map to one of those (motion modules, detection models, wildcards, etc.) use `other` as the type. ## Using AIR with the Orchestration API The Orchestration API accepts AIR strings anywhere a resource is referenced. Given a `modelVersionId` from the site API, the simplest way to get a valid AIR is to call `GET /api/v1/model-versions/{id}` and forward the `air` field. For example, to use `WAI-illustrious-SDXL v16.0` in a text-to-image workflow: 1. `curl https://civitai.com/api/v1/model-versions/2514310` → `"air": "urn:air:sdxl:checkpoint:civitai:827184@2514310"` 2. Pass that string as the checkpoint reference in your [Orchestration submission](/orchestration/guide/submitting-work). ## Building an AIR by hand You can also construct an AIR directly from a Civitai model version: ``` urn:air:{baseModel}:{type}:civitai:{modelId}@{versionId}[+{fileId}] ``` Where `baseModel` comes from the model version's `baseModel` field (`SDXL 1.0` → `sdxl`, `SD 1.5` → `sd15`, etc.) and `type` maps from the parent model's `type` field as shown in the table above. Append `+{fileId}` (using a `ModelFile.id` from `files[]` on the model version response) only when you need to pin a specific file; otherwise the resolver picks the primary file. The site-generated `air` field already handles this mapping — prefer it over hand-construction when you have the option. --- --- url: /orchestration/recipes/anima.md --- # Anima image generation Anima is an anime-focused image generation ecosystem on Civitai's sdcpp workers. Single engine path, one operation (`createImage` — no img2img or edit support), optimized defaults for anime/illustration output: * `engine: "sdcpp"`, `ecosystem: "anima"` * **Only `createImage`** — Anima doesn't expose `createVariant` or `editImage`. Use [Flux 2 Klein](./flux2#klein-createvariant-img2img) or [Qwen](./qwen) if you need img2img or prompt-driven editing. * Higher default `steps` (`30`) and lower default `cfgScale` (`4`) than the SD ecosystems — tuned for anime output * Supports LoRAs for style/character injection * No checkpoint URN needed — the ecosystem ships its own model; an optional `diffuserModel` override exists for advanced cases ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * No checkpoint URN required — Anima uses a built-in diffuser ## Text-to-image ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "anima", "operation": "createImage", "prompt": "masterpiece, best quality, 1girl, solo, portrait, looking at viewer, cinematic lighting", "negativePrompt": "worst quality, low quality, blurry, bad anatomy, deformed hands", "width": 1024, "height": 1024, "cfgScale": 4, "steps": 30 } }] } ``` ### Parameters | Field | Default | Range | Notes | |-------|---------|-------|-------| | `prompt` | — ✅ | ≤ 10 000 chars | Booru-style tags work best. Lead with quality boosters (`masterpiece, best quality, …`). | | `negativePrompt` | *(none)* | ≤ 10 000 chars | Recommended. `worst quality, low quality, blurry, bad anatomy, deformed hands` is a solid starting point. | | `width` / `height` | `1024` | `64`–`2048`, divisible by 16 | Anima is trained around 1024² and well-behaved aspect ratios near that pixel count. | | `cfgScale` | `4` | `0`–`30` | **Lower than SD1/SDXL's 7.** `3`–`5` is the sweet spot for Anima. | | `steps` | `30` | `1`–`150` | **Higher than most sdcpp defaults.** `25`–`35` typical. | | `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). | | `schedule` | `simple` | enum | [`SdCppSchedule`](/orchestration/reference/). | | `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; `0.6`–`1.0` strengths typical. | | `diffuserModel` | *(built-in)* | AIR URN | Optional override for the diffuser. The default built-in model is what you want in almost every case. | | `quantity` | `1` | `1`–`12` | Number of images per call. | | `seed` | random | int64 | Pin for reproducibility. | ### Aspect-ratio variants Anima handles non-square aspect ratios well near ~1 megapixel total area — similar guidance to SDXL. Well-behaved dimensions include 1024², 1152×896, 1344×768, 1536×640, and their mirrors. ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "anima", "operation": "createImage", "prompt": "masterpiece, best quality, cyberpunk anime scene, neon city street at night", "negativePrompt": "worst quality, low quality, blurry", "width": 1344, "height": 768, "cfgScale": 4, "steps": 30 } }] } ``` ### With LoRAs Anima LoRAs are a map of AIR URN → strength. Style LoRAs usually sit at `0.6`–`1.0`; character / concept LoRAs often higher: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "anima", "operation": "createImage", "prompt": "masterpiece, best quality, detailed portrait of a magical girl in a forest", "negativePrompt": "worst quality, low quality", "width": 1024, "height": 1024, "cfgScale": 4, "steps": 30, "loras": { "urn:air:anima:lora:civitai:123456@789012": 0.8 } } }] } ``` Only Anima-tagged LoRAs work on the `anima` ecosystem. ## Reading the result A successful `imageGen` step emits an `images[]` array — one entry per `quantity`: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageGen", "status": "succeeded", "output": { "images": [ { "id": "blob_...", "url": "https://.../signed.jpeg" } ] } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Runtime Typical wall time per 1024×1024 image is 10–25 s. `wait=60` works comfortably for `quantity ≤ 2`. Higher `steps` counts and larger dimensions compound runtime; submit with `wait=0` and poll for large batches or atypical aspect ratios. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Per-pixel + per-step scaling against 1024² / 25 steps: ``` total = 8 × (width × height / 1024²) × (steps / 25) × quantity ``` | Shape | Buzz | |-------|------| | 1024²/`steps: 30`/`quantity: 1` (defaults) | **~9.6** | | 1024²/`steps: 30`/`quantity: 4` | ~38 | | 1344×768/`steps: 30` | ~7.5 × 1.2 ≈ **~9** | | 1024²/`steps: 40` | ~12.8 | ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "operation must be createImage" | Passed `editImage` or `createVariant` | Anima only supports `createImage`. Use [Qwen](./qwen) or [Flux 2 Klein](./flux2#klein-createvariant-img2img) for img2img / edit on anime-style inputs. | | `400` with "ecosystem must be anima" | Typo | Lowercase `"anima"`. | | `400` with "model is not a valid property" | Sent `model` field | Anima has no checkpoint picker — delete the field, or if overriding, use `diffuserModel` instead. | | Output looks flat or off-style | `cfgScale: 7` (SD default) on Anima | Drop to `cfgScale: 4`. Anima wants lower guidance than SD1/SDXL. | | Output underbakes | `steps` too low for the prompt complexity | Bump to `steps: 30`–`40`. Anima's default is already `30` — don't go much below `20`. | | LoRA has no effect | Wrong AIR URN, model private / not published, or ecosystem mismatch | Verify the URN on the LoRA's Civitai page; only Anima-tagged LoRAs work on the `anima` ecosystem. | | Request timed out (`wait` expired) | Large `quantity`, atypical dimensions, or high `steps` | Resubmit with `wait=0` and poll. | | Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Qwen image generation](./qwen) — alternative with edit + variant operations and LoRA support * [SDXL image generation](./sdxl) — higher-fidelity general-purpose alternative * [Flux 2](./flux2) / [Flux 1](./flux1) image generation — newer open-weights families * [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output * [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` * Full parameter catalog: the `AnimaCreateImageGenInput` schema in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator --- --- url: /orchestration/reference.md --- # API Reference Every consumer-facing operation, request schema, and response shape in the Civitai Orchestration API. Pages here are generated from the OpenAPI specification ([`v2-consumers.json`](https://orchestration.civitai.com/openapi/v2-consumers.json)) and stay in sync with the running API on every build. ## Conventions * **Base URL**: `https://orchestration.civitai.com` * **Auth**: `Authorization: Bearer ` on every request. * **Content type**: `application/json` for bodies; blob upload endpoints accept `multipart/form-data` or presigned PUT. * **IDs**: workflow IDs are ULIDs prefixed `wf_`; blob IDs are prefixed `blob_`. * **Polymorphism**: workflow step bodies use a `$type` discriminator; request/response schemas list all valid subtypes under `oneOf`. ## Entry points Most consumer integrations only touch three operations: * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — create a workflow with one or more steps * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — poll a single workflow * [`QueryWorkflows`](/orchestration/reference/operations/QueryWorkflows) — list / filter workflows The left sidebar is grouped by OpenAPI tag — **Workflows**, **WorkflowSteps**, **Recipes**, **Blobs**, **Resources**. Recipes have per-endpoint variants (one per job type) if you prefer the typed surface over the polymorphic `SubmitWorkflow` body. ## Rate limits & quotas ::: info Stub Fill in once the per-tier rate limit scheme is finalized. ::: --- --- url: /orchestration/guide/authentication.md --- # Authentication All consumer endpoints require `Authorization: Bearer ` on every request. ## Getting an API key Manage your API keys from your Civitai account at **[civitai.com](https://civitai.com)** — generate new keys, revoke old ones, and copy tokens from there. Treat API keys like passwords: never commit them to source control, and rotate them if you suspect exposure. ## Using the token ```http Authorization: Bearer ``` All requests go to `https://orchestration.civitai.com`. ## Try It in the docs Most pages on this site have a **Run** widget under each example. Click the **Token** button in the top-right of the navbar to paste your Bearer token; it's stored in your browser's `localStorage` and used for every Run / Reference Try-It on the site. The token never leaves your browser except in the `Authorization` header it sends to `orchestration.civitai.com`. The widget supports: * **Preview cost** — submits with `whatif=true`, shows a per-currency Buzz breakdown. * **Submit for real** — runs the workflow with `wait=90`, then polls [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) until terminal. * **Inline preview** — generated images and videos render in the page once the workflow finishes. Reference operation pages have their own playground panel from the OpenAPI viewer (with its own auth field — paste once, persists across reloads). ::: info Stub Expand once finalized: token scopes, rate limits per tier, rotation policy, how to request elevated access. ::: --- --- url: /site/guide/authentication.md description: How to authenticate with the Civitai site API using bearer tokens. --- # Authentication The Civitai site API uses **bearer tokens** generated from your account settings. A single token covers every endpoint that accepts authentication. ::: info Building a third-party app? Use [OAuth](/site/oauth/) instead of personal API keys — users authorize your app explicitly with the scopes it needs and can revoke it any time, without rotating anything on your side. ::: ## How to pass the token Two methods are supported. The header form is strongly preferred; the query-param form exists mainly for download-tool compatibility and leaks the token into access logs and caches. ### Authorization header (preferred) ```bash curl -H "Authorization: Bearer $CIVITAI_TOKEN" \ "https://civitai.com/api/v1/me" ``` ### Query parameter ```bash curl "https://civitai.com/api/v1/me?token=$CIVITAI_TOKEN" ``` ## Which endpoints require a token? Endpoints fall into three categories: | Category | Behavior without a token | Examples | |---|---|---| | **Public** | Full access. | `GET /creators`, `GET /tags`, `GET /images`, `GET /models/{id}`, `GET /model-versions/*` | | **Mixed** | Accessible, but some filter params or fields may be unavailable. | `GET /models` (the `favorites` and `hidden` query params require auth) | | **Authenticated** | `401 Unauthorized`. | `GET /me` | Each page in the [Reference](../reference/) notes which category an endpoint falls into. ## What 401 looks like Calling an authenticated endpoint without a token — or with an invalid one — returns: ``` HTTP/2 401 Content-Type: application/json {"error":"Unauthorized"} ``` Mixed endpoints silently degrade to anonymous access when no token is provided; they only return 401 if you pass an auth-only filter (e.g. `?favorites=true`) without a valid session. ## Caching and auth Public endpoints set `Cache-Control: public, s-maxage=300, stale-while-revalidate=150` — responses are cached for 5 minutes at the edge. When you call an endpoint *with* a valid token, caching is skipped so personalized responses aren't shared. CORS is open for public endpoints (`Access-Control-Allow-Origin: *`); authenticated requests are restricted to Civitai-owned origins. ## Security tips * Tokens are account-scoped. Rotating one means rotating everywhere it's used. * If you suspect a leak, delete the key from your [account settings](https://civitai.com/user/account) and issue a new one. * Prefer the `Authorization` header over `?token=`; query params end up in server logs, browser history, and proxy caches. * Never embed a token in client-side code shipped to browsers or mobile apps. --- --- url: /site/oauth/buzz-limits.md description: >- How Civitai users cap an OAuth app's buzz spending, and what your app should expect at runtime. --- # Buzz spend limits OAuth tokens that include `AIServicesWrite` authorize your app to spend the user's buzz on AI services (generation, training, scanning). To keep that authorization sane, the consent flow lets users cap how much an app can spend, and they can change the cap later from civitai.com. Your app doesn't set or change the limit — the user does — but knowing what they see and how it surfaces at runtime will save you a lot of debugging. ::: info Scope of the cap Per-app buzz caps are enforced by the orchestrator, so they only apply to **orchestrator-mediated spend** — every AI-services call your token makes. Other buzz-spending scopes that an OAuth token can carry (notably `BountiesWrite`, which lets the user create bounties) are gated by the user's overall balance but are **not** subject to the per-app cap. ::: ## How users set a limit When the user reaches the consent screen for a scope that includes `AIServicesWrite`, Civitai shows a budget control alongside the scope list. The current UI exposes a single "sliding window" budget — buzz limit + period — but the underlying schema is more flexible. After consent, users manage existing limits from **Account → Connected Apps**. They can: * Edit the limit per app. * Remove the limit entirely (no cap). * Revoke the app outright (which is a stronger action — invalidates all the app's tokens). ## Budget shape Limits are stored as an array of budgets. Each budget is one of: | Type | Fields | Meaning | |---|---|---| | `absolute` | `limit`, optional `currencies` | Hard cap. Once hit, no more spending on those currencies until the user resets. | | `sliding` | `limit`, `unit`, `window`, optional `currencies` | Rolling window — e.g. `unit: 7, window: "day"` is "no more than `limit` in any 7-day stretch." This is what the simple UI ships. | | `rollover` | `limit`, `cron`, optional `currencies` | Calendar-based reset on a cron expression (e.g. monthly reset on the 1st). | `currencies` (when set) restricts the budget to specific buzz pools — leave it off and the budget covers every buzz currency. Your app **doesn't read** this structure directly — it's stored per-user and enforced server-side. You'll only ever see its effect: spend calls succeed or fail. ## What your app sees at runtime When the orchestrator blocks a spend — for either "user is broke" **or** "user's per-app cap is hit" — Civitai surfaces it the same way: ```json { "code": "BAD_REQUEST", "message": "Hey buddy, seems like you don't have enough funds to perform this action." } ``` (The `message` may be replaced by an orchestrator-provided detail string when a per-app limit is what tripped the call — but the response **code is the same** either way.) There's no separate error code that lets you distinguish "out of buzz" from "capped by the user". If you need to give a precise message to the user, parse `message` defensively, or check the user's per-app spend state via [`GET /api/v1/me`](../reference/users) ahead of the call and present a likely-cause hint based on whether a limit is set. ::: warning Don't rely on message text for programmatic decisions The exact default message string above comes from [`throwInsufficientFundsError`](https://github.com/civitai/civitai)'s helper and may change. Treat anything beyond the HTTP/RPC code as human-readable only. ::: ## Best practices for buzz-spending clients * **Surface the user's balance.** Call [`GET /api/v1/me`](../reference/users) periodically and show buzz in your UI — users hate guessing whether their next click will be denied. * **Use `whatif=true` for cost preview**, not for limit detection. The orchestration `whatif` mechanism ([see the orchestration guide](../../orchestration/guide/submitting-work)) is designed to give you a per-currency cost breakdown before you submit for real; treat it as a costing tool, not a "will this be denied?" oracle. * **Don't retry on insufficient-funds errors.** Whether it's a real shortfall or the user's per-app cap, retrying won't help until balance or limits change. Show the user the error and let them resolve it. * **Treat token revocation as expected.** A user who hits their cap may decide to revoke your app entirely from civitai.com. Your refresh-token call will return `invalid_grant`; handle that by sending the user back through `/authorize` (with messaging that explains why). * **Never persist budget assumptions across sessions.** Users can change their cap any time; treat each spend call as the source of truth. ## When you don't need buzz scopes If your app doesn't spend buzz on the user's behalf — e.g. a read-only analytics dashboard, or one that submits work using **your own** `client_credentials` token — don't request `AIServicesWrite`. Users won't see the buzz-cap UI, and you skip a whole category of failure modes. --- --- url: /orchestration/recipes/chat-completion.md --- # Chat completion `chatCompletion` routes text (and optionally image) inputs through large language models. Any model available on [OpenRouter](https://openrouter.ai/models) is supported, plus Civitai-hosted AIR models. The request and response shapes follow the OpenAI Chat Completions API. ## Access paths Two ways to use chat completion, depending on your use case: | Path | When to use | |------|-------------| | **`POST /v1/chat/completions`** | Drop-in replacement for the OpenAI API. Accepts `stream: true` for SSE streaming. | | **`chatCompletion` workflow step** | Chain with other steps (`imageGen`, `convertImage`, etc.) in a multi-step workflow. | Both paths share the same input schema and produce the same output format. ## Basic text completion ### Via the OpenAI-compatible endpoint ```http POST https://orchestration.civitai.com/v1/chat/completions Authorization: Bearer Content-Type: application/json { "model": "openai/gpt-4o-mini", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the capital of France?" } ] } ``` ### Via SubmitWorkflow ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "chatCompletion", "input": { "model": "openai/gpt-4o-mini", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the capital of France?" } ] } }] } ``` ## Vision (image inputs) Pass images in user message content parts. Any vision-capable model (e.g. `openai/gpt-4o`, `google/gemini-2.0-flash`) can process them. ```json { "model": "openai/gpt-4o", "messages": [{ "role": "user", "content": [ { "type": "text", "text": "Describe this image in detail." }, { "type": "image_url", "image_url": { "url": "https://image.civitai.com/.../photo.jpeg", "detail": "auto" } } ] }], "max_tokens": 300 } ``` `detail` can be `"auto"` (default), `"low"`, or `"high"`. The image source can be a public URL, a data URL (`data:image/jpeg;base64,...`), or raw Base64 — the orchestrator uploads it to blob storage before dispatching the job. ## Image generation Set `"modalities": ["image", "text"]` on the request to generate images through `/v1/chat/completions`. The response carries an `images` array on the assistant message, where each entry is a base64 data URI — the same shape OpenRouter uses, so existing OpenRouter-style SDK code works unmodified. ```http POST https://orchestration.civitai.com/v1/chat/completions Authorization: Bearer Content-Type: application/json { "model": "google/gemini-2.5-flash-image", "messages": [ { "role": "user", "content": "A cat in a teacup, soft window light" } ], "modalities": ["image", "text"], "image_config": { "aspect_ratio": "1:1", "image_size": "1K" } } ``` Response: ```json { "id": "chatcmpl-...", "model": "google/gemini-2.5-flash-image", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "", "images": [{ "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBOR..." } }] }, "finish_reason": "stop" }] } ``` ### Image editing (multi-turn) Pass a prior generated image (or any image URL / data URI) as a content part on a user message and the request routes through the engine's edit operation: ```json { "model": "google/gemini-2.5-flash-image", "messages": [{ "role": "user", "content": [ { "type": "text", "text": "Make it a dog instead of a cat." }, { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } } ] }], "modalities": ["image", "text"] } ``` ### Supported models | `model` | Engine | Operations | |---------|--------|------------| | `google/gemini-2.5-flash-image` | Gemini 2.5 Flash Image | create, edit | | `openai/gpt-image-1` | OpenAI gpt-image-1 | create, edit | | `openai/dall-e-3` | OpenAI DALL·E 3 | create | | `openai/dall-e-2` | OpenAI DALL·E 2 | create, edit | | `black-forest-labs/flux.2-dev` | Flux 2 Dev | create, edit | | `black-forest-labs/flux.2-flex` | Flux 2 Flex | create, edit | | `black-forest-labs/flux.2-pro` | Flux 2 Pro | create, edit | | `black-forest-labs/flux.2-max` | Flux 2 Max | create, edit | | `black-forest-labs/flux.2-klein` | Flux 2 Klein | create, edit | The provider prefix (`google/`, `openai/`, `black-forest-labs/`) is optional — short names like `gemini-2.5-flash-image`, `gpt-image-1`, `flux-2-dev` are also accepted. Unknown model names with `modalities: ["image"]` return `400` with the supported list. ### Civitai AIR URNs Pass a Civitai [AIR](/site/guide/air) URN as `model` to use a community checkpoint. The ecosystem segment of the AIR (`sd1`, `sdxl`, `flux1`, `anima`) selects the engine; the AIR is forwarded as the checkpoint: ```json { "model": "urn:air:sdxl:checkpoint:civitai:101055@128078", "messages": [{ "role": "user", "content": "A cyberpunk samurai" }], "modalities": ["image", "text"], "image_config": { "aspect_ratio": "1:1", "image_size": "1K" } } ``` | Ecosystem | Engine | Operations | Notes | |-----------|--------|------------|-------| | `sd1` | SD 1.5 (sd-cpp) | create, variant | Pass an input image to trigger img2img variant. | | `sdxl` | SDXL (sd-cpp) | create, variant | Same — img2img variant when an input image is supplied. | | `flux1` | Flux 1 (sd-cpp) | create, edit | Edit operation accepts up to 2 input images; width/height clamped to 832–1216. | | `anima` | Anima (sd-cpp) | create | Anima checkpoints; no img2img path through chat-completions. | Other ecosystems (`zimage`, `qwen`, `wan`, `flux2`) hardcode their checkpoints — pass the matching named model instead (e.g. `flux-2-dev`) and use the [`imageGen` workflow step](./flux2) directly when you need to override the checkpoint. ### `image_config` | Field | Values | Effect | |-------|--------|--------| | `aspect_ratio` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `21:9` | Sets width/height ratio. OpenAI engines snap to their nearest allowed size. Gemini ignores (always 1024×1024). | | `image_size` | `0.5K`, `1K`, `2K`, `4K` | Approximate megapixel target. Engines clamp to their supported range. | | `n` | 1–10 | Number of images. Falls back to the top-level `n`. Engines clamp to their supported max. | For full per-engine knobs (samplers, LoRAs, guidance scales, advanced operations), use the [`imageGen` workflow step](./flux2) directly instead — chat-completions is a thin facade tuned for SDK compatibility, not a full passthrough of every engine parameter. ## Multi-turn conversations Include prior turns as `assistant` messages to maintain context: ```json { "model": "openai/gpt-4o-mini", "messages": [ { "role": "system", "content": "You are a concise assistant." }, { "role": "user", "content": "Write a haiku about the ocean." }, { "role": "assistant", "content": "Waves crash endlessly,\nSalt..." }, { "role": "user", "content": "Now write one about mountains." } ], "temperature": 0.7 } ``` ## Streaming ### Via `/v1/chat/completions` Set `"stream": true` and handle Server-Sent Events (SSE). The response is a stream of `data: {...}` lines ending with `data: [DONE]`: ```http POST https://orchestration.civitai.com/v1/chat/completions Authorization: Bearer Content-Type: application/json { "model": "openai/gpt-4o-mini", "messages": [{ "role": "user", "content": "Tell me a short story." }], "stream": true } ``` ### Via workflow step Set `stream: true` in the step `metadata` field: ```json { "steps": [{ "$type": "chatCompletion", "metadata": { "stream": true }, "input": { "model": "openai/gpt-4o-mini", "messages": [{ "role": "user", "content": "Tell me a short story." }] } }] } ``` When streaming is enabled, the orchestrator stores the raw NDJSON chunks in a streaming blob and assembles them into the standard `ChatCompletionOutput` shape for the workflow output. ## Tool use (function calling) Define tools as JSON Schema function definitions. The model decides when and how to call them: ```json { "model": "openai/gpt-4o", "messages": [ { "role": "user", "content": "What is the weather in Paris?" } ], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } } }], "tool_choice": "auto" } ``` When the model calls a tool, the assistant message in the response contains a `tool_calls` array instead of (or alongside) `content`. Submit the tool result back as a `tool` message: ```json { "role": "tool", "tool_call_id": "call_abc123", "content": "{\"temperature\": 18, \"condition\": \"sunny\"}" } ``` ## Model selection `model` accepts any string that identifies a model on OpenRouter or a Civitai AIR URI: | Format | Example | Notes | |--------|---------|-------| | OpenRouter ID | `openai/gpt-4o-mini` | Any model from [openrouter.ai/models](https://openrouter.ai/models). | | OpenAI shorthand | `gpt-4o`, `gpt-4o-mini` | OpenRouter also accepts bare OpenAI model names. | | AIR URI | `urn:air:llm:model:civitai:@` | Routes to a Civitai-hosted model. | ## Parameters reference | Field | Default | Notes | |-------|---------|-------| | `model` | — ✅ | Model ID (OpenRouter) or AIR URI. | | `messages` | — ✅ | Array of role-discriminated messages (at least 1). | | `temperature` | `1` | 0–2. Higher = more random output. | | `topP` | `1` | 0–1. Nucleus sampling. Alternative to `temperature`; usually set one or the other. | | `maxTokens` | `null` | Max output tokens, 1–128 000. Unlimited when omitted. | | `n` | `1` | Number of completions to generate, 1–128. | | `stop` | `null` | Up to 4 stop sequences. | | `presencePenalty` | `0` | -2 to 2. Positive values discourage repeating topics. | | `frequencyPenalty` | `0` | -2 to 2. Positive values discourage repeating exact tokens. | | `seed` | `null` | Integer seed for deterministic output (beta). | | `user` | `null` | End-user identifier for abuse monitoring. | | `logprobs` | `null` | Return log probabilities for generated tokens. | | `topLogprobs` | `null` | 0–20. Number of top log-prob candidates per token (requires `logprobs: true`). | | `tools` | `null` | Function definitions available to the model. | | `tool_choice` | `null` | `"auto"`, `"none"`, `"required"`, or `{ "type": "function", "function": { "name": "..." } }`. | | `chatTemplateKwargs` | `null` | Extra kwargs passed to the model's chat template (vLLM-specific). | | `modalities` | `null` | Output modalities. Include `"image"` to route the request through the image-generation pipeline. See [Image generation](#image-generation). | | `imageConfig` | `null` | Image-generation parameters (`aspect_ratio`, `image_size`, `n`). Only consulted when `modalities` includes `"image"`. | ## Messages reference Messages are discriminated by the `role` field: ### `system` ```json { "role": "system", "content": "You are a helpful assistant.", "name": "optional" } ``` ### `user` Content can be a plain string or an array of content parts: ```json { "role": "user", "content": "Plain text" } ``` ```json { "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "https://...", "detail": "auto" } } ] } ``` ### `assistant` ```json { "role": "assistant", "content": "Prior response text." } ``` Or with tool calls (as returned by the model): ```json { "role": "assistant", "content": null, "tool_calls": [{ "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\":\"Paris\"}" } }] } ``` ### `tool` ```json { "role": "tool", "tool_call_id": "call_abc123", "content": "{\"temperature\": 18}" } ``` ## Reading the result The output is an OpenAI-compatible `chat.completion` object: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "chatCompletion", "status": "succeeded", "output": { "id": "chatcmpl-...", "object": "chat.completion", "created": 1748000000, "model": "openai/gpt-4o-mini", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 24, "completion_tokens": 9, "total_tokens": 33 } } }] } ``` The `/v1/chat/completions` endpoint returns the `output` object directly (not wrapped in a workflow envelope). ## Cost Cost depends on whether the model routes through OpenRouter or is a Civitai AIR model. ### OpenRouter models Cost is computed from actual token usage with a **30% margin**, converted to Buzz (1 000 Buzz = 1 USD): ``` buzzCost = actualCostUsd × 1000 × 1.3 (minimum 1 Buzz) ``` Before execution, the orchestrator estimates cost using OpenRouter's published per-token prices. After execution, the final Buzz charge is based on the tokens actually consumed by the model. Different models have very different per-token prices — check [openrouter.ai/models](https://openrouter.ai/models) for current pricing. Representative examples: | Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical single call | |-------|-----------------------|------------------------|---------------------| | `openai/gpt-4o-mini` | $0.15 | $0.60 | < 1 Buzz | | `openai/gpt-4o` | $2.50 | $10.00 | 2–15 Buzz | | `anthropic/claude-3-5-sonnet` | $3.00 | $15.00 | 4–20 Buzz | | `meta-llama/llama-3.3-70b-instruct` | $0.12 | $0.30 | < 1 Buzz | Use `whatif=true` on your first request to get an exact preview before committing. ### AIR models (Civitai-hosted) Flat-rate pricing based on image count and number of completions requested: ``` total = 1 × (imageCount × 2) × n ``` ::: warning Known limitation For text-only requests to AIR models (`imageCount = 0`), the `images` factor collapses the product to **0 Buzz**. This is a known bug — expect it to be corrected in a future release. For now, AIR model text-only calls cost 0 Buzz. ::: ## Runtime Most chat completions finish in 5–30 seconds depending on model and output length. Use `wait=60` for simple requests; add `wait=0` + polling for long outputs, large `n`, or slow models. The `/v1/chat/completions` endpoint waits up to 60 seconds before timing out with `504`. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "messages must not be empty" | Empty `messages` array | Include at least one message. | | `400` with "model is required" | Missing `model` field | `model` is always required. | | `504 Gateway Timeout` (via `/v1`) | Slow model or long output | Retry with `wait=0` via `SubmitWorkflow` + polling. | | `400` with "topLogprobs requires logprobs" | Sent `topLogprobs` without `logprobs: true` | Set `"logprobs": true` alongside `topLogprobs`. | | Response truncated mid-sentence | `maxTokens` reached | Raise `maxTokens` or omit it to let the model decide. | | Tool call in response instead of content | Expected behaviour | The model chose to call a tool — feed the `tool_calls` back as a `tool` message in the next turn. | | Step `failed`, `reason = "no_provider_available"` | AIR model offline or no worker available | Retry shortly. | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — for the workflow-step path * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling * [Prompt enhancement](./prompt-enhancement) — uses a `chatCompletion`-like step to rewrite image prompts * [Image conversion](./convert-image) — 1-Buzz utility step to post-process generated images --- --- url: /orchestration/recipes/training-other-image.md --- # Chroma / ERNIE / Qwen / Z-Image LoRA training Five smaller image-LoRA ecosystems share this page: each has its own `ecosystem` value and base checkpoint, but the request shape is otherwise the AI Toolkit standard. | `ecosystem` | Base | Buzz / epoch | Best for | |-------------|------|--------------|----------| | `chroma` | `lodestones/Chroma1-HD` | 200 | Chroma community model fine-tunes | | `ernie` | `baidu/ERNIE-Image` | 100 | ERNIE Image LoRAs | | `qwen` | Qwen-Image (versioned) | 200 | Qwen Image / Qwen-Image-Edit LoRAs | | `zimageturbo` | `ostris/Z-Image-De-Turbo` (+ Z-Image-Turbo extras) | 100 | Z-Image Turbo LoRAs (cheap, fast inference) | | `zimagebase` | `Tongyi-MAI/Z-Image` | 100 | Z-Image base LoRAs | Each ecosystem has its own subsection with a runnable example. The shared schema lives in [Common parameters](#common-parameters); ecosystem-specific quirks are in each subsection. ::: tip Long-running step Always submit with `wait=0`. These ecosystems run anywhere from ~10s/epoch (Z-Image Turbo) to ~2min/epoch (Chroma/Qwen). See [Results & webhooks](/orchestration/guide/results-and-webhooks). ::: ## The request shape ```json { "$type": "training", "input": { "engine": "ai-toolkit", "ecosystem": "chroma" // chroma | ernie | qwen | zimageturbo | zimagebase } } ``` ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A training-data zip (signed R2 URL, Civitai R2 AIR, or any HTTPS URL) * An accurate `count` of images in the zip ## Chroma Trains on the Chroma1-HD base. Uses [`TextToImageV2Job`](/orchestration/reference/) for sample renders; output LoRA is usable wherever Chroma is supported. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "chroma", "epochs": 5, "resolution": 1024, "lr": 0.0001, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 16, "networkAlpha": 16, "trainingData": { "type": "zip", "sourceUrl": "https://civitai-delivery-worker-prod.5ac0637cfd0766c97916cefa3764fbdf.r2.cloudflarestorage.com/training-images/5418/2382561TrainingData.B6Tr.zip", "count": 10 }, "samples": { "prompts": [ "woman with red hair, playing chess at the park, dramatic explosion in background", "a woman holding a coffee cup, in a beanie, sitting at a cafe", "a horse acting as a DJ at a night club, fisheye lens, smoke machine, laser lights" ] } } }] } ``` Chroma defaults: `networkDim: 16`, `optimizerType: adamw8bit`, `trainTextEncoder: false`, `lrScheduler: cosine`. 200 Buzz / epoch. ## ERNIE Trains on Baidu's ERNIE-Image. Comfy-based ecosystem with built-in diffuser. Uses [`ComfyImageGenJob`](/orchestration/reference/) for sample renders. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "ernie", "epochs": 5, "lr": 0.0001, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/7918795/2435272TrainingData.bJ7P.zip", "count": 10 }, "samples": { "prompts": ["a portrait of TOK", "TOK walking through a comic book city"] } } }] } ``` ERNIE defaults: `networkDim: 32`, `optimizerType: adamw8bit`, `trainTextEncoder: false`, `lrScheduler: cosine`. 100 Buzz / epoch. ## Qwen Trains on Qwen-Image. The `version` field selects a specific Qwen-Image release: | `version` | Base resolved to | |-----------|------------------| | `latest` (default) | `Qwen/Qwen-Image-Edit-2512` | | `2509` | `urn:air:qwen:checkpoint:civitai:1864281@2110043` | | `2512` | `Qwen/Qwen-Image-Edit-2512` (same as `latest`) | ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "qwen", "version": "latest", "epochs": 1, "resolution": 1024, "lr": 0.00011, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 16, "networkAlpha": 16, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/3315022/2526079TrainingData.o4S8.zip", "count": 10 }, "samples": { "prompts": [ "woman with red hair, playing chess at the park, dramatic explosion in background", "a woman holding a coffee cup, in a beanie, sitting at a cafe" ] } } }] } ``` Qwen defaults: `networkDim: 16`, `optimizerType: adamw8bit`, `trainTextEncoder: false`, `lrScheduler: cosine`. 200 Buzz / epoch. ## Z-Image Turbo Trains on `ostris/Z-Image-De-Turbo` and pulls in the original `Tongyi-MAI/Z-Image-Turbo` as an extras model. Output LoRA is usable in [Z-Image generation](./zimage) on the `turbo` model. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "zimageturbo", "epochs": 7, "resolution": 512, "lr": 0.000611, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/3315022/2526079TrainingData.o4S8.zip", "count": 10 }, "samples": { "prompts": ["a photo of TOK", "TOK in a garden", "TOK portrait"] } } }] } ``` Z-Image Turbo defaults: `networkDim: 32`, `optimizerType: adamw8bit`, `trainTextEncoder: false`. 100 Buzz / epoch. ## Z-Image Base Trains on `Tongyi-MAI/Z-Image`. The orchestrator overrides `optimizerType` to `automagic` and `lr` to `0.000001` regardless of what you submit — the input fields are accepted but ignored. Use the [Z-Image Turbo](#z-image-turbo) recipe instead unless you specifically need a base-model LoRA. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "zimagebase", "epochs": 7, "resolution": 512, "lr": 0.000611, "trainTextEncoder": false, "lrScheduler": "cosine", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/3315022/2526079TrainingData.o4S8.zip", "count": 10 }, "samples": { "prompts": ["a photo of TOK", "TOK in a garden", "TOK portrait"] } } }] } ``` Z-Image Base defaults: `networkDim: 32`, `optimizerType: automagic` (overridden), `lr: 0.000001` (overridden), `trainTextEncoder: false`. 100 Buzz / epoch. ## Common parameters {#common-parameters} Defaults shown are the post-`ApplyDefaults` values; per-ecosystem deviations are noted above. | Field | Required | Default | Notes | |-------|----------|---------|-------| | `engine` | ✅ | — | Always `ai-toolkit`. | | `ecosystem` | ✅ | — | One of: `chroma`, `ernie`, `qwen`, `zimageturbo`, `zimagebase`. | | `version` | (qwen only) | `latest` | `latest`, `2509`, `2512`. Selects the Qwen-Image base release. | | `epochs` | | `5` | `1`–`20`. Billed per epoch. | | `numberOfRepeats` | | varies (see ecosystem) | `1`–`5000`. ERNIE / Z-Image auto-derive `ceil(200 / count)`; Chroma / Qwen don't auto-set. | | `lr` | | `0.0001` | UNet learning rate. | | `trainTextEncoder` | | `false` | All five ecosystems leave the text encoder frozen. | | `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. | | `optimizerType` | | `adamw8bit` (`automagic` for Z-Image Base) | Full enum on the [SDXL/SD1 page](./training-sdxl-sd1#common-parameters). | | `networkDim` | | `32` (`16` for Chroma / Qwen) | `1`–`256`. | | `networkAlpha` | | matches `networkDim` | `1`–`256`. | | `noiseOffset` | | `0` | `0`–`1`. | | `flipAugmentation` | | `false` | Random horizontal flips. | | `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. | | `triggerWord` | | *(none)* | Activation token. Recommended for character / style LoRAs on Chroma, Z-Image. | | `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. | | `samples.prompts[]` | | `[]` | Per-epoch preview prompts rendered with the trained LoRA. | | `samples.negativePrompt` | | *(none)* | — | ## Reading the result Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a `.safetensors` LoRA blob plus any sample images. The trained LoRA is usable in the corresponding generation recipe — Chroma LoRAs in any Chroma workflow, ERNIE LoRAs in [ERNIE image generation](./ernie), Qwen LoRAs in [Qwen image generation](./qwen), Z-Image LoRAs in [Z-Image generation](./zimage). ## Runtime Per-epoch wall time, default settings on a 10-image dataset: | Ecosystem | Per-epoch | Typical full run | |-----------|-----------|-------------------| | `chroma` | ~60–120 s | 5–15 min for 5 epochs | | `ernie` | ~30–60 s | 3–8 min for 5 epochs | | `qwen` | ~60–120 s | 5–15 min for 5 epochs | | `zimageturbo` | ~10–25 s | 1–4 min for 7 epochs | | `zimagebase` | ~10–25 s | 1–4 min for 7 epochs | Always use `wait=0`. ## Cost ``` total = costPerEpoch × epochs ``` | Ecosystem | Buzz / epoch | `epochs: 5` | `epochs: 10` | |-----------|--------------|-------------|--------------| | `chroma` | 200 | 1000 | 2000 | | `ernie` | 100 | 500 | 1000 | | `qwen` | 200 | 1000 | 2000 | | `zimageturbo` | 100 | 500 | 1000 | | `zimagebase` | 100 | 500 | 1000 | Sample-prompt rendering is billed separately at each ecosystem's image-generation rate. Use `whatif=true` (the **Preview cost** button on the widgets above) to confirm exact charges before submitting. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "ecosystem unknown" | Typo, or not one of `chroma` / `ernie` / `qwen` / `zimageturbo` / `zimagebase` | Check spelling. | | `400` with "version not allowed" (Qwen only) | `version` not one of `latest` / `2509` / `2512` | Use one of the listed values. | | Z-Image Base: `optimizerType` you set seems ignored | Intentional — `ApplyDefaults` overrides to `automagic` | Use Z-Image Turbo if you need full optimizer control. | | Trained LoRA underbaked | Too few epochs / too low `lr` | Raise `epochs` to 8–15 (these ecosystems often need more epochs than SDXL); keep `lr` ≤ `5e-4`. | | Trained LoRA overcooked | Too many epochs or `networkDim` too high | Drop `networkDim` to 16, lower `epochs`. | | Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged images. | ## Related * [SDXL & SD1 LoRA training](./training-sdxl-sd1) — classic Stable Diffusion ecosystems * [Flux 1 LoRA training](./training-flux1) / [Flux 2 Klein LoRA training](./training-flux2-klein) — Flux family * [Wan video LoRA training](./training-wan) / [LTX2 video LoRA training](./training-ltx2) — video LoRAs * Generation recipes for these ecosystems: [Z-Image](./zimage), [Qwen](./qwen), [ERNIE](./ernie) * [Results & webhooks](/orchestration/guide/results-and-webhooks) * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) * [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml) --- --- url: /site.md description: 'REST API for browsing models, images, creators, and tags on civitai.com.' --- # Civitai Site API The Civitai site exposes a public REST API at `https://civitai.com/api/v1/...` for browsing models, model versions, images, creators, and tags. It's the same surface that powers third-party tools like Stable Diffusion downloaders and metadata lookup utilities. This is **not** the Orchestration API. If you want to *submit* generation work, see the [Orchestration docs](/orchestration/). ## Where to start * **[Guide](./guide/)** — authentication, pagination, error handling, and the AIR (AI Resource Identifier) format. * **[Reference](./reference/)** — per-resource documentation for every public endpoint, sourced directly from the current Next.js handlers. ## Quick example ```bash # Public — no auth required curl "https://civitai.com/api/v1/models?limit=1&types=LORA" # Authenticated — pass a Civitai API token curl -H "Authorization: Bearer $CIVITAI_TOKEN" \ "https://civitai.com/api/v1/me" ``` See [Getting started](./guide/getting-started) for a full walkthrough. --- --- url: /site/reference/creators.md description: List Civitai creators. --- # Creators Creators are users who have published at least one model on Civitai. ## List creators ``` GET /api/v1/creators ``` **Auth:** Public. ### Query parameters | Name | Type | Default | Description | |------|------|---------|-------------| | `limit` | integer (1–200) | 20 | Number of items per page. | | `page` | integer (≥ 1) | 1 | 1-indexed page number. | | `query` | string | — | Full-text search on username. | Pagination is page-based only — there is no `cursor` parameter on this endpoint. ### Response ```json { "items": [ { "username": "JustMaier", "modelCount": 3, "link": "https://civitai.com/api/v1/models?username=JustMaier", "image": "https://image.civitai.com/.../JustMaier.jpeg" } ], "metadata": { "totalItems": 84916, "currentPage": 1, "pageSize": 1, "totalPages": 84916, "nextPage": "https://civitai.com/api/v1/creators?limit=1&page=2" } } ``` ### Field notes * `link` is pre-built — follow it to list a creator's models via [`GET /models`](./models#list-models). * `modelCount` is only included when greater than zero; creators with no published models are excluded from the listing entirely. * `image` is null when the creator has no avatar. ### Notes * For very deep traversals, scope with `?query=` rather than paging linearly — the listing is sorted alphabetically by username, so `query=A`, `query=B`, ... is a reliable way to walk the full set. ### Examples ```bash # First page curl "https://civitai.com/api/v1/creators?limit=20" # Find a specific creator curl "https://civitai.com/api/v1/creators?query=JustMaier" ``` --- --- url: /site/reference/enums.md description: Valid enum values used across the Civitai site API. --- # Enums ``` GET /api/v1/enums ``` **Auth:** Public. Returns the current set of enum values used elsewhere in the site API — model types, file types, base models, and their sub-types. Call this endpoint to discover valid values for query params like `types=` and `baseModels=` on [`GET /models`](./models), rather than hardcoding lists. ### Response ```json { "ModelType": [ "Checkpoint", "TextualInversion", "Hypernetwork", "AestheticGradient", "LORA", "LoCon", "DoRA", "Controlnet", "Upscaler", "MotionModule", "VAE", "Poses", "Wildcards", "Workflows", "Detection", "Other" ], "ModelFileType": [ "Model", "Text Encoder", "Pruned Model", "Negative", "Training Data", "VAE", "Config", "Archive" ], "ActiveBaseModel": [ "Flux.1 D", "Flux.2 D", "SDXL 1.0", "Illustrious", "Qwen", "Wan Video 2.2 T2V-A14B", "ZImageTurbo", "..." ], "BaseModel": [ "SD 1.5", "SD 2.1", "SD 3.5", "SDXL 1.0", "Flux.1 D", "Illustrious", "Pony", "Hunyuan Video", "..." ], "BaseModelType": [ "Standard", "Inpainting", "Refiner", "Pix2Pix" ] } ``` Only the shape is guaranteed above — the list contents change as Civitai adds support for new model families. Always fetch live values rather than baking them into clients. ### Key distinctions * **`ModelType`** — the kind of artifact (checkpoint vs. LoRA vs. VAE, etc.). Use as the `types=` filter on `GET /models`. * **`ModelFileType`** — the role of a file *within* a model version (main model, VAE, text encoder, training data). Appears as `files[].type`. * **`BaseModel`** — every base model Civitai has ever catalogued. Use as `baseModels=` when filtering. * **`ActiveBaseModel`** — the subset of `BaseModel` that Civitai's on-site generation currently supports. If you're building around Orchestration workflows, filter to these. * **`BaseModelType`** — sub-classification of a base model (e.g. Standard vs. Inpainting SDXL). Appears as `baseModelType` on model versions. ### Example ```bash curl "https://civitai.com/api/v1/enums" | jq '.ModelType' ``` --- --- url: /orchestration/recipes/ernie.md --- # ERNIE image generation Baidu's ERNIE Image is a distillation-friendly text-to-image family hosted on Civitai's Comfy workers. Single engine path, one operation (`createImage` — no img2img, variant, or edit support), two model variants that differ only in speed vs. quality: * `engine: "comfy"`, `ecosystem: "ernie"` * **Only `createImage`** — ERNIE doesn't expose `createVariant` or `editImage`. Use [Flux 2 Klein](./flux2#klein-createvariant-img2img) or [Qwen](./qwen) if you need img2img or prompt-driven editing. * Built-in diffuser, VAE, and text encoder — no `model` URN to pick. The only `model` field is the variant selector (`ernie` or `turbo`). * LoRA support (ERNIE-tagged LoRAs only). ## Variants | `model` | Steps (default) | `cfgScale` (default) | Best for | |---------|-----------------|----------------------|----------| | `ernie` | `20` | `4` | **Default** — full-quality output, standard sampling | | `turbo` | `8` | `1` | Distilled for speed — 3–4× faster and ~⅓ the Buzz per image; use for drafts, batches, and iteration | Leave `cfgScale: 1` on `turbo` — it's a distilled model and doesn't respond to classifier-free guidance the way the standard variant does. ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * No checkpoint URN required — both variants ship with built-in diffuser / VAE / text encoder ## Standard (`model: "ernie"`) ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "comfy", "ecosystem": "ernie", "model": "ernie", "operation": "createImage", "prompt": "A red panda wearing a yellow rain jacket, cinematic soft light, highly detailed", "width": 1024, "height": 1024, "steps": 20, "cfgScale": 4, "sampler": "euler", "scheduler": "simple", "quantity": 1 } }] } ``` ### Parameters | Field | Default | Range | Notes | |-------|---------|-------|-------| | `prompt` | — ✅ | ≤ 10 000 chars | Natural-language descriptions work well; ERNIE handles complex scenes better than tag-soup. | | `negativePrompt` | *(none)* | ≤ 10 000 chars | Optional. Shorter is usually better — ERNIE's defaults are already clean. | | `width` / `height` | `1024` | `64`–`2048`, divisible by 16 | Trained around 1024². Well-behaved near ~1 megapixel total. | | `steps` | `20` | `1`–`150` | Diminishing returns past ~25. | | `cfgScale` | `4` | `0`–`30` | `3`–`5` is the sweet spot. | | `sampler` | `euler` | enum | [`ComfySampler`](/orchestration/reference/). `euler` is what the model was tuned against. | | `scheduler` | `simple` | enum | [`ComfyScheduler`](/orchestration/reference/). | | `loras` | `{}` | `{ airUrn: strength }` | Stack multiple. Only `urn:air:ernie:lora:...` LoRAs work here. | | `quantity` | `1` | `1`–`12` | Number of images per call. | | `seed` | random | int64 | Pin for reproducibility. | ### Portrait aspect ratio ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "comfy", "ecosystem": "ernie", "model": "ernie", "operation": "createImage", "prompt": "Portrait of a woman with flowing hair standing in a blooming cherry blossom field, golden hour lighting", "negativePrompt": "worst quality, blurry, low resolution", "width": 832, "height": 1216, "steps": 20, "cfgScale": 4, "sampler": "euler", "scheduler": "simple", "seed": 42 } }] } ``` ## Turbo (`model: "turbo"`) Distilled variant — same input surface as standard, just lower defaults for `steps` and `cfgScale`. Use this as the default when you're iterating on prompts or generating batches; fall back to `ernie` for hero shots. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "comfy", "ecosystem": "ernie", "model": "turbo", "operation": "createImage", "prompt": "A red panda wearing a yellow rain jacket, cinematic soft light, highly detailed", "width": 1024, "height": 1024, "steps": 8, "cfgScale": 1, "sampler": "euler", "scheduler": "simple", "quantity": 1 } }] } ``` Turbo-specific tuning: | Field | Default | Notes | |-------|---------|-------| | `steps` | `8` | Stay in `6`–`12`. Pushing past `~16` wastes Buzz without improving output on the distilled model. | | `cfgScale` | `1` | Distilled — leave at `1`. Raising it usually over-saturates / burns the output. | Everything else (`prompt`, `negativePrompt`, dimensions, `sampler`, `scheduler`, `seed`, `quantity`, `loras`) matches the standard variant. ## Reading the result ERNIE emits the standard `imageGen` output — an `images[]` array, one entry per `quantity`: ```json { "status": "succeeded", "steps": [{ "$type": "imageGen", "name": "$0", "status": "succeeded", "output": { "images": [ { "id": "aa6e7228-68cd-4d15-b4d7-5005b2bfbac6-0.jpg", "width": 1024, "height": 1024, "url": "https://orchestration.civitai.com/v2/consumer/blobs/…?sig=…", "urlExpiresAt": "2027-04-15T17:18:54.3195353Z", "previewUrl": "https://orchestration.civitai.com/v2/consumer/blobs/…?sig=…", "previewUrlExpiresAt": "2027-04-15T17:18:54.3196735Z", "available": true, "nsfwLevel": "pg13" } ], "errors": [] } }] } ``` `url` and `previewUrl` are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. The `nsfwLevel` field carries the moderation classification applied to the output. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Per-pixel + per-step scaling against 1024² and the variant's default step count: **Standard** (`ComfyErnieStandardCreateImageGenInput.CalculateCost`): ``` total = 20 × (width × height / 1024²) × (steps / 20) × quantity ``` **Turbo** (`ComfyErnieTurboCreateImageGenInput.CalculateCost`): ``` total = 8 × (width × height / 1024²) × (steps / 8) × quantity ``` | Shape | Standard (Buzz) | Turbo (Buzz) | |-------|-----------------|--------------| | 1024² / default steps / `quantity: 1` (defaults) | **20** | **8** | | 832×1216 / default steps / `quantity: 1` | ~20 | ~8 | | 1024² / default steps / `quantity: 4` | ~80 | ~32 | | 1024² / `steps: 40` (standard) / `steps: 16` (turbo) | ~40 | ~16 | Standard pricing is ~2.5× turbo at defaults — reach for turbo when iterating on prompts. ## Runtime Claim duration (`job.startedAt` → `job.completedAt`) measured against `orchestration-next` with `quantity: 1`: | Variant | Shape | Claim duration | |---------|-------|----------------| | `ernie` (standard) | 1024² / 20 steps | ~29 s | | `ernie` (standard) | 832×1216 / 20 steps | ~27 s | | `turbo` | 1024² / 8 steps | ~13 s | `wait=60` covers single-image calls comfortably. For `quantity > 1`, larger dimensions, or high `steps` counts, compute + queue wait typically runs past the 60 s long-poll ceiling — submit with `wait=60` and re-issue `GET /v2/consumer/workflows/{id}?wait=60` on a loop until the response is terminal (see [Submitting work → Waiting for results](/orchestration/guide/submitting-work#waiting-for-results)), or register a webhook. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "No derived type found for discriminator value 'ernie'" on `ecosystem` | ERNIE not yet rolled out to the environment you're hitting | Confirm the target orchestrator lists `ernie` in `/openapi/v2-consumers.json` → `ComfyImageGenInput` `ecosystem` enum. Retry after rollout. | | `400` with "operation must be createImage" | Passed `editImage` or `createVariant` | ERNIE only supports `createImage`. Use [Qwen](./qwen) or [Flux 2 Klein](./flux2#klein-createvariant-img2img) for img2img / edit. | | `400` on `model` | Sent a full AIR URN or a value other than `ernie` / `turbo` | The `model` field is a variant selector, not a checkpoint URN. Only `"ernie"` and `"turbo"` are valid. | | `400` on `width` / `height` | Value not divisible by 16, or outside `64`–`2048` | Round to a valid multiple of 16 inside that range. | | Turbo output looks over-saturated / blown out | `cfgScale > 1` on the distilled model | Set `cfgScale: 1` for turbo. Raise `steps` instead if you want more fidelity. | | Standard output ignores the prompt | `cfgScale` too low | Bump toward `4`–`6`. `cfgScale: 1` on standard barely steers the model. | | LoRA silently has no effect | Wrong AIR URN, or ecosystem mismatch | Only `urn:air:ernie:lora:…` LoRAs work here. Verify the URN on the LoRA's Civitai page. | | Request timed out (`wait` expired) | Large `quantity`, atypical dimensions, or high `steps` | Resubmit and resume with a `GET …?wait=60` loop, or register a webhook. | | Step `failed`, `reason = "blocked"` | Prompt hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Qwen image generation](./qwen) — alternative with edit + variant operations and LoRA support * [Flux 2 image generation](./flux2) — higher-fidelity general-purpose alternative with `createVariant` * [Z-Image generation](./zimage) — the other distilled, extremely cheap + fast image recipe (sdcpp-based) * [Anima image generation](./anima) — anime-tuned sdcpp ecosystem, same single-operation shape * [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output * [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` * Full parameter catalog: `ComfyErnieStandardCreateImageGenInput` and `ComfyErnieTurboCreateImageGenInput` in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator --- --- url: /site/guide/errors.md description: Error response shape and HTTP status codes used by the Civitai site API. --- # Errors ## Response shape Most errors come back as a single-field JSON object: ```json { "error": "descriptive message" } ``` Some errors originate inside the internal tRPC layer and get forwarded with a richer shape: ```json { "code": "UNAUTHORIZED", "message": "descriptive message", "issues": [ /* optional Zod validation details */ ] } ``` Either way, inspect the HTTP status code first and use the body for a human-readable explanation. ## Status codes | Status | Meaning | Typical cause | |--------|---------|---------------| | **200** | OK | Successful read. | | **400** | Bad Request | Invalid query parameters. For list endpoints, response body includes Zod-style validation issues. Combining `?query=` with `?page=` also returns 400. | | **401** | Unauthorized | Missing or invalid token on an `Authenticated` endpoint, or an auth-only filter (e.g. `?favorites=true`) on a `Mixed` endpoint without a session. | | **403** | Forbidden | Valid token, but the user is not permitted to access the resource. | | **404** | Not Found | Unknown model, version, or hash. Body shape: `{"error": "No model with id 0"}`. | | **405** | Method Not Allowed | Wrong HTTP verb for the endpoint. | | **429** | Too Many Requests | Either edge rate limiting (Cloudflare) or the `page * limit > 1000` pagination cap — see [Pagination](./pagination). | | **500** | Internal Server Error | Unexpected failure. Safe to retry with backoff. | ## Retries The API does not expose a `Retry-After` header for most failures. For 5xx and 429 responses, apply exponential backoff starting at ~1 second and cap at \~30 seconds. Don't retry 4xx responses other than 429 — the request shape itself is the problem. ## Rate limits There is no per-endpoint rate limit exposed as a stable contract. Cloudflare enforces edge limits (generic DDoS / abuse protection) in front of the API; those are operational, not a published SLA. Treat unexpected 429s as a signal to back off, not as a scheme to code against. --- --- url: /orchestration/guide/errors-and-retries.md --- # Errors & Retries Errors surface in two places: 1. **HTTP responses** on the submit / get / update endpoints — validation, auth, rate-limit, or server issues. 2. **Step status** inside a returned workflow — the request succeeded, but a step reached `failed`, `expired`, or `canceled`. The orchestrator races providers internally, so transient provider failures (one worker crashing, one region being slow) are usually invisible to you — a different provider claims the job. You only see step-level failures once every viable provider has been exhausted. ## HTTP error shape Validation and error responses use the standard [RFC 7807](https://www.rfc-editor.org/rfc/rfc7807) `application/problem+json` shape ([`ProblemDetails`](/orchestration/reference/operations/SubmitWorkflow) / [`ValidationProblemDetails`](/orchestration/reference/operations/SubmitWorkflow)): ```json { "type": "https://tools.ietf.org/html/rfc9110#section-15.5.1", "title": "One or more validation errors occurred.", "status": 400, "errors": { "steps[0].input.resolution": [ "The value '4k' is not valid for resolution." ] } } ``` Fields: * `status` — the HTTP status code (also in the response line). * `title` — short, human-readable summary. * `detail` — longer description when the orchestrator can give one. * `errors` — present on `400` validation failures; JSON Pointer–style paths → list of messages. * `instance` / `type` — optional URIs identifying the specific occurrence and error class. ## HTTP status taxonomy | Code | Meaning | Retry? | |------|---------|--------| | `400 Bad Request` | Body failed validation. `errors` map tells you exactly which fields. | **No** — fix the request. | | `401 Unauthorized` | Missing / expired / malformed bearer token. | **No** — obtain a new token. | | `403 Forbidden` | Token is valid but can't perform this operation (recipe not enabled for your tier, mature content not permitted, etc.). | **No** — request access; don't retry as-is. | | `404 Not Found` | Workflow or blob ID doesn't exist (or the token can't see it). | **No** — check the ID. | | `409 Conflict` | The workflow is in a state that blocks this mutation (e.g. updating a step that already started). | **Maybe** — refetch and reconcile. | | `429 Too Many Requests` | Rate-limit hit. | **Yes, with backoff.** | | `5xx Server Error` | Transient orchestrator issue. | **Yes, with backoff.** | Retry guidance for `429` / `5xx`: exponential backoff with jitter, capped at ~30 s between attempts, give up after ~5 tries. Don't retry `400` / `401` / `403` / `404` until you've fixed the underlying issue. ## Step-level failures A `200` / `202` on submit means the workflow was accepted — individual steps can still fail later. The [`Workflow`](/orchestration/reference/operations/GetWorkflow) payload you get back (or that arrives by webhook) carries per-step status: ```json { "id": "wf_01HXYZ...", "status": "failed", "steps": [ { "name": "0", "$type": "videoGen", "status": "failed", "jobs": [ { "status": "failed", "reason": "no_provider_available", "blockedReason": null } ] } ] } ``` Workflow / step / job statuses share the same enum: `unassigned`, `preparing`, `scheduled`, `processing`, `succeeded`, `failed`, `expired`, `canceled`. Terminal states are `succeeded`, `failed`, `expired`, `canceled` — once reached, [they do not change](./results-and-webhooks#delivery-semantics). The `reason` and `blockedReason` fields on failed jobs are the best hint at *why*: | `reason` | What it means | Your move | |----------|---------------|-----------| | `no_provider_available` | No provider can run this job with the given inputs (unusual resolution, unsupported duration, restricted region, etc.). | Relax inputs, try another `provider`/`version`, or retry later. | | `blocked` | The job was blocked by content moderation. `blockedReason` explains further. | Don't retry the same input; rework the prompt or image. | | `timeout` / `expired` | Job exceeded its internal deadline. | Safe to resubmit — possibly with a smaller workload. | | `canceled` | Someone (you or an operator) canceled the workflow via [`DeleteWorkflow`](/orchestration/reference/operations/DeleteWorkflow). | No retry unless you actually want to re-run it. | When `reason` is absent, the failure is generic — safe to retry once with the same body. ## Webhook retries If you've registered callbacks, the orchestrator retries transient failures on your endpoint automatically. See [Results & webhooks → Delivery semantics](./results-and-webhooks#delivery-semantics) for the serialization guarantees you can rely on. ## Common gotchas * **Blob URLs 403 after a few minutes.** The signed URL expired — refetch the workflow (or call [`GetBlob`](/orchestration/reference/operations/GetBlob)) for a fresh one. This isn't a real failure. * **`202` after `wait=90`.** The workflow didn't finish within the [100-second request timeout](./getting-started#_3-poll-if-you-didn-t-wait-inline). Expected for video / training / large-batch jobs — continue via webhooks or polling. * **Step `canceled` unexpectedly.** Check whether another process called [`DeleteWorkflow`](/orchestration/reference/operations/DeleteWorkflow). The orchestrator itself only cancels on explicit request or when a dependent step already failed. --- --- url: /orchestration/recipes/flux1.md --- # Flux 1 image generation Flux 1 is Black Forest Labs' original open-weights family (Dev / Schnell plus the commercial Kontext tier). The whole family is the **`flux1` ecosystem** on the orchestrator — same checkpoint family, same AIR prefix (`urn:air:flux1:…`), same resource pool for workers and capability matching. What differs is how you *invoke* it: there's no single `engine: "flux1"` discriminator, so you pick one of three `engine` values depending on what you want: | `engine` | Best for | Notes | |----------|----------|-------| | `sdcpp` (ecosystem `flux1`) | **Default** — Stable Diffusion C++ on Civitai workers | Only `diffuserModel` is required; VAE / CLIP-L / T5-XXL default to sensible components. Supports LoRAs, `createImage` / `createVariant` / `editImage`. | | `comfy` (ecosystem `flux1`) | When you specifically need ComfyUI sampler knobs | Full sampler/scheduler enum control, LoRA support, checkpoint via AIR URN. Picks a heavier worker than sdcpp — reach for this only if you need a Comfy-specific sampler. | | `flux1-kontext` (ecosystem `flux1`) | Image editing / prompt-based edits via BFL's managed Kontext API | `dev` / `pro` / `max` tiers; the `ecosystem` field isn't in the request body but the endpoint lives in the same ecosystem internally | **Default choice for new integrations**: `engine: "sdcpp"`, `ecosystem: "flux1"`. Sdcpp's defaults handle the component models for you, so you only need to pick a diffuser. Reach for `comfy` when you need a specific Comfy sampler; use `flux1-kontext` when you want BFL's managed editor. If you're starting fresh and don't need Flux.1 specifically, consider [Flux 2](./flux2) — cleaner schema, better quality, same orchestration-side usage. ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A Flux.1 diffuser AIR URN (for sdcpp / comfy paths) — browse the [Civitai Flux 1.D catalog](https://civitai.com/models?baseModels=Flux.1+D) * For `createVariant` / `editImage` / Kontext editing: one or more source image URLs ## sdcpp (default path) Runs Flux.1 on Civitai's sdcpp workers. Minimal required input — just pick a diffuser and write a prompt. Every other model component (VAE, CLIP-L, T5-XXL) has a working default; LoRAs, samplers, and dimensions are tunable. ### Text-to-image ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "flux1", "operation": "createImage", "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639", "prompt": "A photorealistic portrait of a woman in a cyberpunk city, neon reflections", "width": 1024, "height": 1024 } }] } ``` Common sdcpp-flux1 parameters: | Field | Default | Range | Notes | |-------|---------|-------|-------| | `diffuserModel` | — ✅ | AIR URN | The only required model component. A Flux.1 diffuser from the catalog. | | `prompt` | — ✅ | ≤ 1000 chars | Natural-language descriptions work best on Flux. | | `width` / `height` | `1024` | `832`–`1216`, divisible by 16 | Tighter than Comfy's `64`–`2048`. | | `steps` | `28` | `4`–`50` | Sampler steps. Diminishing returns past ~30. | | `cfgScale` | `3.5` | `1`–`20` | Classifier-free guidance. `2.5`–`4` is the sweet spot for Flux. | | `sampleMethod` | `euler` | enum | See [`SdCppSampleMethod`](/orchestration/reference/). | | `schedule` | `simple` | enum | See [`SdCppSchedule`](/orchestration/reference/). | | `negativePrompt` | *(none)* | string | Available — Comfy/Kontext flux1 variants don't expose one. | | `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; strengths in `0.0`–`2.0` are typical. | | `quantity` | `1` | `1`–`4` | Number of images per call. | | `seed` | random | int64 | Pin for reproducibility. | | `vaeModel` | *(default)* | AIR URN | Override the default VAE. Usually unnecessary. | | `clipLModel` | *(default)* | AIR URN | Override the default CLIP-L. | | `t5XXLModel` | *(default)* | AIR URN | Override the default T5-XXL text encoder. | The default component URNs (Green-Sky's quantized GGUF releases on HuggingFace) are what the orchestrator falls back to when you omit `vaeModel` / `clipLModel` / `t5XXLModel`. They work out of the box — override only if you need a specific quantization or cached component. ### With LoRAs LoRAs are a map of AIR URN → strength, identical shape to Comfy: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "flux1", "operation": "createImage", "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639", "prompt": "A detailed anime character in a magical forest, ethereal lighting", "width": 1024, "height": 1024, "loras": { "urn:air:flux1:lora:civitai:123456@789012": 0.8 } } }] } ``` ### Image-to-image (`createVariant`) Pass a source image and a new prompt; the model re-imagines it. `strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. `0.6`–`0.8` is the "keep composition, change style" sweet spot. ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "flux1", "operation": "createVariant", "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639", "prompt": "Make it daytime with clear blue sky", "width": 1024, "height": 1024, "image": "https://image.civitai.com/.../source.jpeg", "strength": 0.7 } }] } ``` Note `image` is a plain string URL (not a `{ url: ... }` wrapper), and the field is `strength` (not `denoiseStrength` like on Comfy). ### Edit image (`editImage`) Alternative to `createVariant` — accepts up to two reference images and treats the prompt as an edit instruction rather than a variant direction: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "flux1", "operation": "editImage", "diffuserModel": "urn:air:flux1:diffuser:civitai:618692@691639", "prompt": "Make it a winter scene with snow falling", "width": 1024, "height": 1024, "images": [ "https://image.civitai.com/.../source.jpeg" ] } }] } ``` `images[]` takes up to 2 entries. Use `createVariant` when you want a strength-weighted re-imagining of a single source; use `editImage` when you want prompt-driven surgery (a more literal "do X to this picture" interpretation). ## Comfy (ComfyUI-specific knobs) When you need controls specific to ComfyUI's sampler surface — `ComfySampler` / `ComfyScheduler` enum values, a single-checkpoint AIR URN instead of separate components, or `denoiseStrength` semantics on img2img — use `engine: "comfy"`. Otherwise prefer `sdcpp`. ### Text-to-image ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "comfy", "ecosystem": "flux1", "operation": "createImage", "model": "urn:air:flux1:checkpoint:civitai:618692@691639", "prompt": "A photorealistic portrait of a woman in a cyberpunk city, neon reflections", "width": 1024, "height": 1024, "steps": 20, "cfgScale": 3.5, "sampler": "euler", "scheduler": "simple", "quantity": 1 } }] } ``` Key differences from sdcpp: | Field | sdcpp | comfy | |-------|-------|-------| | Model spec | `diffuserModel` (+ optional components) | `model` — single checkpoint AIR URN | | Sampler | `sampleMethod` ([`SdCppSampleMethod`](/orchestration/reference/)) | `sampler` ([`ComfySampler`](/orchestration/reference/)) | | Schedule | `schedule` ([`SdCppSchedule`](/orchestration/reference/)) | `scheduler` ([`ComfyScheduler`](/orchestration/reference/)) | | Img2img strength | `strength` (`createVariant`) | `denoiseStrength` (`createVariant`) | | Max `quantity` | `4` | `12` | | Max `width` / `height` | `1216` | `2048` | | `negativePrompt` | ✅ | — | Comfy also supports `createVariant` with the same shape, using a plain `image` string (URL, data URL, or Base64) and `denoiseStrength` instead of the plain `image` / `strength` pair sdcpp uses. See the [`ComfyFlux1VariantImageGenInput` schema](/orchestration/reference/) for the full field list. ## flux1-kontext (managed editing tier) `flux1-kontext` stays inside the `flux1` ecosystem — same checkpoint family, same AIR prefix for any LoRAs/models you'd reference elsewhere in the `flux1` ecosystem — but routes inference to BFL's managed Kontext provider. Three model tiers (`dev`/`pro`/`max`), simpler input schema — just `prompt` + optional `images[]` + `aspectRatio`. No checkpoint selection, no LoRAs, no sampler knobs. The trade-off is convenience: BFL handles quality; you handle prompts and reference images. ### Text-to-image ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux1-kontext", "model": "pro", "prompt": "A photograph of a cat wearing a tiny astronaut helmet", "quantity": 1 } }] } ``` ### Image editing (the Kontext strength) Pass `images[]` to edit an existing image via prompt: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux1-kontext", "model": "max", "prompt": "Make it daytime", "quantity": 1, "images": [ "https://image.civitai.com/.../source.jpeg" ] } }] } ``` Kontext models: | `model` | Notes | |---------|-------| | `dev` | Open-weights tier. Cheapest Kontext option. | | `pro` | Commercial tier — BFL's standard production model. Default recommendation. | | `max` | Top tier — highest quality, slowest, most expensive. Use for hero shots. | Kontext-specific parameters: | Field | Default | Notes | |-------|---------|-------| | `prompt` | — ✅ | ≤ 1000 chars. | | `images[]` | — | URLs, data URLs, or Base64. When present → image-edit mode. Omit → text-to-image. | | `aspectRatio` | `1:1` | Enum: `21:9`, `16:9`, `4:3`, `3:2`, `1:1`, `2:3`, `3:4`, `9:16`, `9:21`. | | `guidanceScale` | `3.5` | `1`–`20`. | | `quantity` | `1` | `1`–`4`. | | `seed` | random | int64. | ## Reading the result All Flux 1 paths emit the standard `imageGen` output — an `images[]` array, one entry per `quantity`: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageGen", "status": "succeeded", "output": { "images": [ { "id": "blob_...", "url": "https://.../signed.jpeg" } ] } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Runtime | Path | Typical wall time per 1024×1024 image | `wait` recommendation | |------|---------------------------------------|-----------------------| | `sdcpp + flux1` | 10–30 s | `wait=60` usually fine | | `comfy + flux1` | 10–30 s (LoRAs add a few seconds each) | `wait=60` usually fine | | `flux1-kontext` (dev / pro) | 10–30 s depending on BFL queue | `wait=60` usually fine | | `flux1-kontext` (max) | 15–60 s | `wait=60` sometimes, fall back to `wait=0` on busy periods | `quantity > 2` or large dimensions push you toward the 100-second request timeout — submit with `wait=0` and poll instead. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. **sdcpp path** (`Flux1SdCppImageGenInput.CalculateCost`): ``` base = 0.5 × steps × (editImages + 1) × (cfgScale == 1 ? 1 : 2) total = base × quantity ``` | Shape | Buzz | |-------|------| | `createImage`, `steps: 28`, `cfgScale: 3.5`, `quantity: 1` | **~28** | | `createImage`, `steps: 28`, `quantity: 4` | ~112 | | `createVariant`, `quantity: 1` | ~28 | | `editImage` with 1 reference | ~56 | **Comfy path** (`ComfyFlux1ImageGenInput.CalculateCost`) — per-pixel + per-step scaling: ``` total = 8 × (width × height / 1024²) × (steps / 20) × quantity ``` At 1024² / `steps: 20` / `quantity: 1` → **~8 Buzz**. Comfy scales linearly with pixels and steps — 512² halves, 2048² quadruples, `steps: 40` doubles, and so on. **Kontext** (`flux1-kontext`, BFL-hosted) — flat per-image by tier: | Tier | Buzz per image | |------|----------------| | `dev` | **~35** | | `pro` | **~45** | | `max` | **~90** | Multiply by `quantity`. No per-step / per-pixel scaling since Kontext doesn't expose those knobs. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with unknown property | Field not valid for this `engine` (e.g. `sampler` on sdcpp, `sampleMethod` on comfy, `loras` on `flux1-kontext`) | Match the schema for your chosen engine — see the tables above. | | `400` with "diffuserModel is required" | sdcpp `createImage` / `createVariant` / `editImage` without a diffuser | Supply `diffuserModel` — the only required model component on sdcpp. VAE / CLIP-L / T5-XXL default automatically. | | `400` with "model must match AIR pattern" | Passed a bare model ID or version slug | Use a full AIR URN: `urn:air:flux1:diffuser:civitai:@` (sdcpp) or `urn:air:flux1:checkpoint:civitai:@` (comfy). | | `400` with "width/height out of range" on sdcpp | sdcpp clamps tighter than Comfy (`832`–`1216`, divisible by 16) | Round to a valid multiple of 16 inside that range, or switch to the Comfy engine for more freedom. | | Output ignores the prompt on Flux.1 | `cfgScale` too low or prompt too short | Raise `cfgScale` toward 4; add lighting / composition / camera cues. | | LoRA silently has no effect | Wrong AIR URN, unpublished / private model | Verify the URN on the LoRA's Civitai page; strengths outside `0.0`–`2.0` may also be clamped. | | Kontext edit returns a generation unrelated to the source | `images[]` URL not reachable by BFL | Use a CDN-served URL (Civitai CDN works); see the source-URL notes in [Transcription → Choosing a source URL](./transcription). | | Request timed out (`wait` expired) | Large `quantity`, Kontext `max` on a busy queue | Resubmit with `wait=0` and poll. | | Step `failed`, `reason = "blocked"` | Prompt or input image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Flux 2 image generation](./flux2) — newer Flux family with a cleaner schema, higher quality * [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output * [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` (use `ecosystem: "flux1"` on the enhancer) * [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling longer runs * Full parameter catalog: the `Flux1SdCppInput`, `ComfyFlux1Input`, `Flux1KontextImageGenInput` schemas in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator --- --- url: /orchestration/recipes/training-flux1.md --- # Flux 1 LoRA training Train a Flux.1 LoRA on your own image dataset using AI Toolkit. The output LoRA is usable directly in [Flux 1 image generation](./flux1) (sdcpp or Comfy paths). | `modelVariant` | Base model | Inference characteristics | |----------------|-----------|---------------------------| | `dev` (default) | `black-forest-labs/FLUX.1-dev` | Higher fidelity, ~20–28 sampler steps. Good default for most LoRAs. | | `schnell` | `black-forest-labs/FLUX.1-schnell` | Faster inference, 4 sampler steps, no CFG. Use when you specifically want a Schnell-targeted LoRA. | The base checkpoint is fixed by `modelVariant` — there's no `model` field to override. To train on a non-BFL Flux.1 finetune, use the [SDXL & SD1](./training-sdxl-sd1) or [other-image](./training-other-image) ecosystems instead. ::: tip Long-running step Flux 1 training is the most expensive AI Toolkit ecosystem (200 Buzz/epoch) and runs for ~30s–2min per epoch on a typical 10-image dataset. Always use `wait=0` and follow up via polling or a webhook — see [Results & webhooks](/orchestration/guide/results-and-webhooks). ::: ## The request shape ```json { "$type": "training", "input": { "engine": "ai-toolkit", "ecosystem": "flux1", "modelVariant": "dev" // dev | schnell } } ``` ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A training-data zip uploaded to a reachable URL (signed R2 URL, Civitai R2 AIR, or any HTTPS URL) * An accurate `count` of images in the zip ## Flux 1 dev (default) Trains on top of `FLUX.1-dev` and produces a LoRA usable with any Flux 1 dev workflow. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "flux1", "modelVariant": "dev", "epochs": 5, "resolution": 1024, "lr": 0.0001, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 16, "networkAlpha": 16, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2657604TrainingData.EYBd.zip", "count": 10 }, "samples": { "prompts": ["a photo of TOK", "TOK in a garden", "TOK portrait"] } } }] } ``` ## Flux 1 schnell Trains on top of `FLUX.1-schnell`. Inference uses 4 steps and `cfgScale: 0` — the output LoRA is meant to be used in those conditions. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "input": { "engine": "ai-toolkit", "ecosystem": "flux1", "modelVariant": "schnell", "epochs": 5, "lr": 0.0001, "trainTextEncoder": false, "networkDim": 16, "networkAlpha": 16, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2657604TrainingData.EYBd.zip", "count": 10 }, "samples": { "prompts": ["a photo of TOK", "TOK in a garden"] } } }] } ``` ## Common parameters {#common-parameters} Shared by both Flux 1 variants. Defaults shown are after `ApplyDefaults`. | Field | Required | Default | Notes | |-------|----------|---------|-------| | `engine` | ✅ | — | Always `ai-toolkit`. | | `ecosystem` | ✅ | — | Always `flux1` for this page. | | `modelVariant` | ✅ | — | `dev` or `schnell`. Determines the base checkpoint. | | `epochs` | | `5` | `1`–`20`. Billed per epoch. | | `numberOfRepeats` | | auto: `ceil(200 / count)` | `1`–`5000`. | | `lr` | | `0.0001` | UNet learning rate. Flux 1 is sensitive to high LRs — keep ≤ `0.0005`. | | `trainTextEncoder` | | `false` | Flux 1 does not benefit much from text-encoder training. Leave off. | | `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. | | `optimizerType` | | `adamw8bit` | `adamw`, `adamw8bit`, `adam8bit`, `lion`, `lion8bit`, `adafactor`, `adagrad`, `prodigy`, `prodigy8bit`, `automagic`. | | `networkDim` | | `16` | `1`–`256`. Flux 1's lower default reflects how compactly Flux LoRAs encode style/character vs. SD-family. | | `networkAlpha` | | matches `networkDim` | `1`–`256`. | | `noiseOffset` | | `0` | `0`–`1`. | | `flipAugmentation` | | `false` | Random horizontal flips. | | `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. | | `triggerWord` | | *(none)* | Activation token. Recommended for character / style LoRAs. | | `trainingData.{type, sourceUrl, count}` | ✅ | — | Always `type: "zip"`. | | `samples.prompts[]` | | `[]` | Preview prompts rendered after each epoch using the trained LoRA at strength 1.0. | | `samples.negativePrompt` | | *(none)* | — | ## Reading the result Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result) for the full shape. The relevant bit: ```json { "output": { "moderationStatus": "Approved", "epochs": [ { "epochNumber": 1, "model": { "id": "blob_...", "url": "https://.../epoch_1.safetensors" }, "samples": [{ "id": "blob_...", "url": "https://.../sample_0.jpeg" }] } ] } } ``` The `model` blob is your trained LoRA — download it (URLs are signed and expire), or use the blob URL directly with [Flux 1 image generation](./flux1) by referencing its AIR in the `loras` field. ## Runtime Per-epoch wall time on a 10-image dataset, default settings: | Variant | Per-epoch | 5-epoch full run | |---------|-----------|-------------------| | `dev` | ~60–120 s | 5–15 min | | `schnell` | ~60–120 s | 5–15 min | Always use `wait=0`. ## Cost ``` total = 200 × epochs (Buzz) ``` | Configuration | Buzz | |---------------|------| | `epochs: 5` | 1000 + samples | | `epochs: 10` | 2000 + samples | | `epochs: 20` (max) | 4000 + samples | Sample-prompt rendering is billed separately at the appropriate Flux 1 generation rate. Run with `whatif=true` (the **Preview cost** button on the widgets above) to see the exact pre-flight charge. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "modelVariant required" | Missing `modelVariant` field | Set to `"dev"` or `"schnell"`. | | `400` with "epochs out of range" | `epochs` outside `1`–`20` | Cap at 20. | | `400` with "trainingData.sourceUrl not reachable" | Signed URL expired | Regenerate. Prefer Civitai R2 AIRs over signed URLs for long-lived references. | | Trained LoRA underbaked | Too few epochs for dataset, or `lr` too low | Raise `epochs` to 8–12 for character LoRAs; keep `lr` at `0.0001`–`0.0003`. | | Trained LoRA overfits | Too many epochs / too high `networkDim` | Lower `epochs`, drop `networkDim` to 8–12. | | Step `failed`, output `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged images. | ## Related * [SDXL & SD1 LoRA training](./training-sdxl-sd1) — cheaper, classic SD ecosystems * [Flux 2 Klein LoRA training](./training-flux2-klein) — current Flux generation, including image-edit training * [Flux 1 image generation](./flux1) — use a trained LoRA via `loras: { "": 1.0 }` * [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running training jobs * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) * [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml) --- --- url: /orchestration/recipes/flux2.md --- # Flux 2 image generation Flux 2 is Black Forest Labs' latest image-generation family. The orchestrator exposes every shipped variant under the `imageGen` step, selected by the `model` field: | `model` | Best for | Notes | |---------|----------|-------| | `klein` | **Default** — cheapest and most capable variant for almost every workload | Supports `createImage` / `createVariant` / `editImage`. Two size tiers (`4b` / `9b`). Takes LoRAs. Runs on Civitai infra. | | `dev` | Higher fidelity when Klein isn't enough, with LoRA support | Supports `createImage` / `editImage`. Exposes `guidanceScale` + `numInferenceSteps`. | | `flex` | Mid-tier quality, faster than `dev` | Supports `createImage` / `editImage`. Fewer tunable knobs. | | `pro` | Commercial tier — routed through BFL's provider | Supports `createImage` / `editImage`. No LoRAs. | | `max` | Top commercial tier — premium hero shots | Supports `createImage` / `editImage`. Slowest + most expensive. | **Default choice for new integrations**: `model: "klein"`, `modelVersion: "4b"`. Upgrade to `9b` when you want more fidelity on the same variant, step to `dev` for open-weights Flux 2 with the official sampler, or `pro` / `max` for BFL-managed commercial output. ## The request shape Every Flux 2 request is a single `imageGen` step with three keys selecting the variant and operation: ```json { "$type": "imageGen", "input": { "engine": "flux2", "model": "klein", // klein | dev | flex | pro | max "operation": "createImage" // createImage | editImage } } ``` The orchestrator dispatches to the matching input schema (`Flux2KleinCreateImageInput`, `Flux2DevEditImageInput`, …), so only the fields valid for that combination are accepted — [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) will `400` on unknown ones. ::: tip `createVariant` on Klein The native `flux2` engine exposes `createImage` and `editImage` on every model. If you want strength-weighted img2img (`createVariant`), **Klein** and **Dev** each have a second invocation path via `engine: "sdcpp"` + `ecosystem: "flux2Klein"` / `"flux2Dev"` — same models, extra operations. See [Klein → createVariant](#klein-createvariant-img2img) and [Dev createVariant](#dev-createvariant-img2img-via-sdcpp) below. ::: ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * For `editImage` / `createVariant` operations: one or more source image URLs, data URLs, or Base64 strings ## Klein (default) Klein is the cost/capability sweet spot for almost every Flux 2 workload. Cheap enough to generate at scale, capable enough for production output, and the only variant that supports `createVariant`. Two size tiers: | `modelVersion` | Typical use | |----------------|-------------| | `4b` (default) | Fastest, cheapest. Great default. | | `4b-base` | Un-tuned 4b checkpoint — useful for custom fine-tuning, not for direct generation. | | `9b` | Higher fidelity at higher cost. Step up from `4b` when quality matters more than throughput. | | `9b-base` | Un-tuned 9b checkpoint, same caveats as `4b-base`. | | `9b-kv` | 9b with key-value caching (ComfyUI worker only). Rare; use when a worker explicitly requires it. | ### Text-to-image (`createImage`) ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux2", "model": "klein", "operation": "createImage", "modelVersion": "4b", "prompt": "A cozy cabin in the woods at sunset, cinematic lighting", "width": 1024, "height": 1024, "cfgScale": 5, "steps": 20 } }] } ``` Klein-specific parameters: | Field | Default | Range | Notes | |-------|---------|-------|-------| | `modelVersion` | `4b` | `4b` / `4b-base` / `9b` / `9b-base` / `9b-kv` | Size tier. `4b` is the default workload pick. | | `cfgScale` | `5` | `1`–`20` | Classifier-free guidance. `4`–`6` is the sweet spot on Klein. | | `steps` | `20` | `4`–`50` | Sampler steps. Klein is efficient — 20 is usually plenty. | | `sampleMethod` | `euler` | enum | [`SdCppSampleMethod`](/orchestration/reference/). | | `schedule` | `simple` | enum | [`SdCppSchedule`](/orchestration/reference/). | | `negativePrompt` | *(none)* | string | Available on Klein — not exposed on `dev` / `flex` / `pro` / `max`. | | `loras` | `{}` | `{ airUrn: strength }` | Stack multiple; strengths in `0.0`–`2.0` are typical. | Plus the shared Flux 2 fields (`prompt`, `width`, `height`, `seed`, `quantity`, `outputFormat`, `enablePromptExpansion`) — see [Common parameters](#common-parameters). ### Bumping up to 9b When `4b` isn't delivering enough fidelity, switch `modelVersion` to `9b` — same shape, same knobs, just a heavier model: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux2", "model": "klein", "operation": "createImage", "modelVersion": "9b", "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting", "width": 1024, "height": 1536, "cfgScale": 5, "steps": 24 } }] } ``` ### With LoRAs Flux 2 Klein LoRAs are a map of AIR URN → strength (same shape as [Flux 1](./flux1)): ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux2", "model": "klein", "operation": "createImage", "modelVersion": "4b", "prompt": "A detailed anime character in a magical forest, ethereal lighting", "width": 1024, "height": 1024, "cfgScale": 5, "steps": 20, "loras": { "urn:air:flux2:lora:civitai:2169780@2443422": 1.0 } } }] } ``` Browse the [Civitai Flux 2 LoRA catalog](https://civitai.com/models?baseModels=Flux.2+D) for AIR URNs. ### Edit image (`editImage`) Pass `images[]` (up to 2 entries) alongside a prompt treated as an edit instruction: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux2", "model": "klein", "operation": "editImage", "modelVersion": "4b", "prompt": "Make it a winter scene with snow falling", "width": 1024, "height": 1024, "images": [ "https://image.civitai.com/.../source.jpeg" ] } }] } ``` ### Klein createVariant (img2img) {#klein-createvariant-img2img} The native `engine: "flux2"` path doesn't expose `createVariant` on Klein, but there's a second invocation path that does: `engine: "sdcpp"` + `ecosystem: "flux2Klein"`. Same model, same LoRAs, same size tiers — adds `createVariant` with `image` (single source) + `strength`: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "flux2Klein", "operation": "createVariant", "modelVersion": "4b", "prompt": "Make it daytime with clear blue sky", "width": 1024, "height": 1024, "image": "https://image.civitai.com/.../source.jpeg", "strength": 0.7 } }] } ``` `strength` controls how much of the source to preserve — `0.0` returns the source unchanged, `1.0` discards it entirely. `0.6`–`0.8` is the "keep composition, change style" sweet spot. The sdcpp path also supports `createImage` and `editImage` on Klein with the same field shapes shown above under the native `flux2` engine — just swap `engine: "flux2", model: "klein"` for `engine: "sdcpp", ecosystem: "flux2Klein"`. Most users can stay on the native `flux2` engine; reach for the sdcpp path when you need `createVariant`. ## Dev — higher-fidelity open-weights When Klein isn't delivering and you want open-weights quality, `dev` is the next step up. Supports LoRAs; exposes the native Flux 2 sampler interface (`guidanceScale`, `numInferenceSteps`): ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "flux2", "model": "dev", "operation": "createImage", "prompt": "A majestic cat sitting on a throne, highly detailed, 8k", "width": 1024, "height": 1024, "quantity": 1, "guidanceScale": 2.5, "numInferenceSteps": 28 } }] } ``` Dev-specific parameters: | Field | Default | Range | Notes | |-------|---------|-------|-------| | `guidanceScale` | `2.5` | `0`–`20` | Lower = more creative, higher = sticks closer to the prompt. `2.5`–`4.0` is the sweet spot. | | `numInferenceSteps` | `28` | `4`–`50` | Sampler steps. Diminishing returns past ~30. | | `loras[]` | `[]` | array of `{ air, strength }` | **Note the shape difference from Klein**: Dev uses an *array* of `{ air, strength }` objects; Klein uses a *dict*. | Dev also supports `operation: "editImage"` with `images[]` — same shape as Klein's edit, just on the richer sampler surface. ### Dev createVariant (img2img) via sdcpp Like Klein, the native `engine: "flux2"` path doesn't expose `createVariant` on Dev — but there's a second invocation path that does: `engine: "sdcpp"` + `ecosystem: "flux2Dev"`. Same model, same LoRA support, with `image` (single source) + `strength` for img2img: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "sdcpp", "ecosystem": "flux2Dev", "operation": "createVariant", "prompt": "Make it daytime with clear blue sky", "width": 1024, "height": 1024, "image": "https://image.civitai.com/.../source.jpeg", "strength": 0.7 } }] } ``` `strength` runs `0.0`–`1.0` (default `0.7`). The sdcpp path also supports `createImage` and `editImage` on Dev — most users can stay on the native `flux2` engine; reach for the sdcpp path when you need `createVariant`. ## Flex — faster, lighter Mid-tier quality, tuned for throughput. Same knobs as `dev`, slightly lower fidelity: ```json { "$type": "imageGen", "input": { "engine": "flux2", "model": "flex", "operation": "createImage", "prompt": "A serene mountain landscape with a crystal clear lake at dawn", "width": 1024, "height": 1024, "guidanceScale": 3.5, "numInferenceSteps": 28 } } ``` Also supports `editImage`. ## Pro — BFL commercial tier Routed through Black Forest Labs' production provider. No LoRAs, no sampler knobs — just prompt in, image out. Use when Klein / dev don't meet quality needs and you're willing to pay for BFL-managed output: ```json { "$type": "imageGen", "input": { "engine": "flux2", "model": "pro", "operation": "createImage", "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting", "width": 1024, "height": 1536 } } ``` ## Max — BFL flagship Top commercial tier. Slowest and most expensive. Use for hero shots where quality matters more than throughput: ```json { "$type": "imageGen", "input": { "engine": "flux2", "model": "max", "operation": "createImage", "prompt": "An epic fantasy battle scene with dragons, cinematic lighting, intricate details", "width": 1536, "height": 1024 } } ``` Same shape as `pro`; heavier backing model. ## Common parameters {#common-parameters} These apply across all Flux 2 models (per the [`Flux2ImageGenInput` schema](/orchestration/reference/operations/InvokeImageGenStepTemplate)): | Field | Required | Default | Notes | |-------|----------|---------|-------| | `prompt` | ✅ | — | ≤ 1000 characters. Natural-language descriptions work best — include lighting, composition, camera/lens cues. | | `width` | | `1024` | `512`–`2048`. Klein requires divisible by 16; other models have no divisibility constraint. | | `height` | | `1024` | `512`–`2048`. Klein requires divisible by 16; other models have no divisibility constraint. | | `quantity` | | `1` | `1`–`4`. Number of images returned per call. | | `outputFormat` | | `jpeg` | `jpeg` or `png`. `png` for lossless, `jpeg` for smaller files. | | `seed` | | random | `int64`. Pin for reproducibility. | | `enablePromptExpansion` | | `false` | Model-side prompt expansion — Flux rewrites your prompt before generation. Off by default. | For `editImage` operations, add `images[]` (up to 2 entries on Klein) — HTTP(S) URLs, data URLs, or Base64 strings. ## Reading the result A successful `imageGen` step emits an `images[]` array — one entry per `quantity`: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageGen", "status": "succeeded", "output": { "images": [ { "id": "blob_...", "url": "https://.../signed.jpeg" } ] } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Runtime Rough ranges on Civitai-hosted infra (warm node, queue permitting): | Variant | Typical wall time per 1024×1024 image | `wait` recommendation | |---------|---------------------------------------|-----------------------| | `klein` (`4b`) | 5–15 s | `wait=60` fine for `quantity: 1` | | `klein` (`9b`) | 10–25 s | `wait=60` usually fine | | `dev`, `flex` | 10–30 s | `wait=60` usually works for `quantity ≤ 2` | | `pro`, `max` | 15–60 s depending on BFL queue | `wait=60` works sometimes; fall back to `wait=0` + polling on busy periods | Past ~2 images, large dimensions, or `pro`/`max` on a busy queue, you risk hitting the 100 s request timeout — submit with `wait=0` and poll / webhook. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` to preview the exact charge before submitting; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. **Klein** (`Flux2KleinSdCppImageGenInput.CalculateCost`) — driven by `modelVersion`, `steps`, and `cfgScale`: ``` base = stepCost × steps × (editImages + 1) × (cfgScale == 1 ? 1 : 2) stepCost = 0.3 (4b / 4b-base), 0.5 (9b / 9b-base) total = base × quantity ``` | Variant | Shape | Buzz | |---------|-------|------| | Klein `4b`, `createImage`, `steps: 20`, `cfgScale: 5`, `quantity: 1` | default | **~12** | | Klein `4b`, `createImage`, `quantity: 4` | batch | ~48 | | Klein `4b`, `editImage` with 1 reference | edit | ~24 | | Klein `9b`, `createImage`, `steps: 24`, `cfgScale: 5` | upgrade | **~24** | **Dev / Flex / Pro / Max** use a per-megapixel formula — `ceil(width × height / 1 000 000) × costPerMegapixel × quantity`, where `costPerMegapixel` doubles when LoRAs are present on `dev`. A default 1024² `dev` createImage lands at **~40 Buzz**; expect commercial-tier variants (`pro`, `max`) to be materially higher. Run `whatif=true` when pricing matters. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "prompt must be less than 1000" | Too long | Trim; 500 chars is plenty for most prompts. | | `400` with "width/height out of range" | Outside `512`–`2048`, or not divisible by 8 (16 on Klein) | Round to a valid multiple. | | `400` with unexpected property | Field not valid for this `model`/`operation` (e.g. `loras` on `pro`, `guidanceScale` on `klein`, `cfgScale` on `dev`) | Match the schema for your chosen variant — see the tables above. Klein uses `cfgScale`/`steps`/`sampleMethod`; dev/flex use `guidanceScale`/`numInferenceSteps`. | | `400` with "createVariant is not a valid operation" on Klein / Dev (native `flux2` engine) | Native `flux2` engine only exposes `createImage` + `editImage` | Use `engine: "sdcpp"` + `ecosystem: "flux2Klein"` or `"flux2Dev"` to access `createVariant`. See [Klein createVariant](#klein-createvariant-img2img) or [Dev createVariant](#dev-createvariant-img2img-via-sdcpp). | | `400` with "LoRA not found" | AIR URN wrong or model private / not published | Verify the URN on the model's Civitai page. | | Output ignores the prompt | `enablePromptExpansion: true` with a short prompt; or guidance too low | Set `enablePromptExpansion: false` and/or raise `cfgScale` (Klein) / `guidanceScale` (dev, flex). | | Request timed out (`wait` expired) | Large `quantity`, `max`/`pro` on a busy queue | Resubmit with `wait=0` and poll. | | Step `failed`, `reason = "blocked"` | Prompt or input image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Flux 1 image generation](./flux1) — classic Flux.1 family (sdcpp, Comfy, Kontext editing) * [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output * [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` * [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling longer runs * Full parameter catalog: the `Flux2Input` and `Flux2KleinSdCppInput` schemas in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface; import into Postman / OpenAPI Generator --- --- url: /orchestration/recipes/training-flux2-klein.md --- # Flux 2 Klein LoRA training Train a Flux 2 Klein LoRA for use with the [Flux 2 image generation](./flux2) recipe. Two size tiers, plus a special **edit-training** mode for image-editing LoRAs that take control / reference images at inference time. | `modelVariant` | Base | Buzz / epoch | Use when | |----------------|------|--------------|----------| | `4b` (default) | `FLUX.2-klein-base-4B` | 50 | Cheaper / faster training. Pairs with Klein `4b` inference. | | `9b` | `FLUX.2-klein-base-9B` | 100 | Higher fidelity. Pairs with Klein `9b` inference. | The base checkpoint is fixed by `modelVariant`; there is no `model` field on the input. Set `isEditTraining: true` to train an editing LoRA — the dataset zip layout changes (see [Edit training](#edit-training)). ::: tip Long-running step Always submit with `wait=0`. Klein training takes ~10–60s per epoch on a 10-image dataset; full multi-epoch runs land in single-digit minutes for `4b`, longer for `9b`. ::: ## The request shape ```json { "$type": "training", "input": { "engine": "ai-toolkit", "ecosystem": "flux2klein", "modelVariant": "4b", // 4b | 9b "isEditTraining": false // optional, defaults to false } } ``` ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A training-data zip: * For standard training: a flat zip of training images * For [edit training](#edit-training): a zip with `main/`, `control_1/`, `control_2/`, `control_3/` subfolders ## Klein 4b (default) Fastest and cheapest tier. Default for most LoRAs. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "flux2klein", "modelVariant": "4b", "epochs": 1, "lr": 0.0005, "trainTextEncoder": false, "lrScheduler": "constant", "optimizerType": "adamw8bit", "networkDim": 2, "networkAlpha": 1, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2658016TrainingData.1zGG.zip", "count": 15 }, "samples": { "prompts": [ "fruit, food, no humans, blue eyes, solo, leaf, strawberry, fangs, pokemon (creature)", "no humans, pokemon (creature), cup, food, solo, bird, blush, blurry, animal focus", "no humans, candle, pokemon (creature), blurry, animal focus, solo, food, bird, standing" ] } } }] } ``` ## Klein 9b Same shape, larger base model. Recommended `epochs: 5`+, `networkDim: 32`, `lr: ~1e-4`. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "flux2klein", "modelVariant": "9b", "epochs": 5, "resolution": 1024, "lr": 0.000102, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "urn:air:other:other:civitai-r2:civitai-delivery-worker-prod@training-images/6/2657604TrainingData.EYBd.zip", "count": 1 }, "samples": { "prompts": [] } } }] } ``` ## Edit training {#edit-training} Setting `isEditTraining: true` produces an **editing LoRA** — at inference time it takes one or more reference images alongside the prompt and modifies them. The dataset zip layout differs: * `main/` — target images (what the LoRA should produce) * `control_1/`, `control_2/`, `control_3/` — reference / source images that pair with each `main/` entry Filenames inside the subfolders must align across folders. Reading the result you'll get a LoRA that works with [Flux 2 Klein → editImage](./flux2#edit-image-editimage). ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training", "edit"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "flux2klein", "modelVariant": "4b", "isEditTraining": true, "epochs": 3, "lr": 0.0001, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "https://blobs-temp.sfo3.digitaloceanspaces.com/flux2_klein_edit_testdata.zip", "count": 3 }, "samples": { "prompts": [ "a portrait of a woman standing in a sunlit garden with flowers", "a landscape painting of rolling hills at sunset", "a painting of a cat sitting on a windowsill looking outside at a rainy day" ], "sourceImages": [ "https://blobs-temp.sfo3.digitaloceanspaces.com/sample-edit-source-1.jpg", "https://blobs-temp.sfo3.digitaloceanspaces.com/sample-edit-source-2.jpg" ] } } }] } ``` `samples.sourceImages` is required for edit training when you want preview samples — the listed URLs become the reference images for the per-epoch sample renders. ## Common parameters {#common-parameters} Defaults shown are the post-`ApplyDefaults` values for Klein. | Field | Required | Default | Notes | |-------|----------|---------|-------| | `engine` | ✅ | — | Always `ai-toolkit`. | | `ecosystem` | ✅ | — | Always `flux2klein` for this page. | | `modelVariant` | ✅ | — | `4b` or `9b`. | | `isEditTraining` | | `false` | When `true`, dataset zip must contain `main/` + `control_*/` subfolders. | | `epochs` | | `5` | `1`–`20`. Billed per epoch. | | `numberOfRepeats` | | auto: `ceil(200 / count)` | `1`–`5000`. | | `lr` | | `0.0001` | Klein is sensitive to high LRs — keep in `1e-4`–`5e-4`. | | `trainTextEncoder` | | `false` | Klein uses Qwen-3 as its text encoder; AI Toolkit does not train it. Leave `false`. | | `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. | | `optimizerType` | | `adamw8bit` | See SDXL/SD1 page for full enum. | | `networkDim` | | `32` | `1`–`256`. Klein LoRAs are typically `16`–`32`. | | `networkAlpha` | | matches `networkDim` | `1`–`256`. | | `noiseOffset` | | `0` | `0`–`1`. | | `flipAugmentation` | | `false` | Random horizontal flips. | | `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. | | `triggerWord` | | *(none)* | Activation token. | | `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. For edit training, `count` should equal the number of `main/` entries. | | `samples.prompts[]` | | `[]` | Per-epoch preview prompts. | | `samples.negativePrompt` | | *(none)* | — | | `samples.sourceImages[]` | | `[]` | Edit-training only — reference images for sample renders. | ## Reading the result Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a Klein LoRA `.safetensors` blob plus any sample images. To use the trained LoRA, register it on Civitai (or reference its blob URN directly) and pass it under `loras` in a [Flux 2 Klein generation](./flux2#klein-default) request: ```json { "$type": "imageGen", "input": { "engine": "flux2", "model": "klein", "operation": "createImage", "modelVersion": "4b", "prompt": "your prompt", "loras": { "urn:air:flux2:lora:civitai:@": 1.0 } } } ``` ## Runtime Per-epoch wall time, default settings on a 10-image dataset: | Variant | Per-epoch | Typical full run | |---------|-----------|-------------------| | `4b` | ~10–30 s | 1–5 min for 5 epochs | | `9b` | ~30–90 s | 5–15 min for 5 epochs | | `4b` + `isEditTraining` | ~20–45 s | 2–8 min for 5 epochs (more steps per epoch) | Always use `wait=0`. ## Cost ``` total = costPerEpoch × epochs costPerEpoch = 50 (4b), 100 (9b) ``` | Configuration | Buzz | |---------------|------| | Klein `4b`, `epochs: 5` | 250 + samples | | Klein `4b`, `epochs: 1` | 50 + samples | | Klein `9b`, `epochs: 5` | 500 + samples | | Klein `9b`, `epochs: 10` | 1000 + samples | Sample-prompt rendering is billed separately at Klein image-generation rates (~8 Buzz per sample for `4b`, ~16 for `9b`). Run with `whatif=true` to see exact charges. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "modelVariant required" | Missing `modelVariant` | Set to `"4b"` or `"9b"`. | | `400` with "isEditTraining: true requires control folders" | Edit-training zip missing `control_*/` subfolders | Repackage the zip with `main/`, `control_1/`, `control_2/`, `control_3/`. Filenames must align across folders. | | Step `failed` mentioning "training data validation" | Edit-training zip filenames don't match across `main/` and `control_*/` | Ensure the same basenames appear in `main/` and at least one `control_*/` folder. | | Trained LoRA underbaked | Too few epochs / too low `lr` | Raise `epochs` to 5–10; keep `lr` ≤ `5e-4`. | | Trained LoRA overcooked / broken samples | `lr` too high | Drop `lr` to `1e-4`–`2e-4`. | | Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged images. | ## Related * [Flux 1 LoRA training](./training-flux1) — open-weights Flux LoRAs (Dev / Schnell) * [SDXL & SD1 LoRA training](./training-sdxl-sd1) — cheaper SD-family ecosystems * [Flux 2 image generation](./flux2) — use a trained Klein LoRA via `loras: { ... }` on `model: "klein"` * [Results & webhooks](/orchestration/guide/results-and-webhooks) — handling long-running training jobs * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) * [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml) --- --- url: /orchestration/recipes/gemini.md --- # Gemini image generation The `gemini` engine exposes **Gemini 2.5 Flash Image** — the same underlying model product as Google's [`nano-banana-*`](./google) variants, but via the direct Gemini API rather than Vertex AI. Simpler input shape: no aspect-ratio or resolution picker, just prompt (+ optional reference images for edits) and a `quantity`. Uses `operation` as the discriminator, mirroring most other imageGen engines. ::: tip Gemini vs Google If you want explicit aspect-ratio control, resolution tiers, or web-search grounding, use the [`google` engine](./google) with `model: "nano-banana-2"` — same product, richer controls. Pick `gemini` when you want the minimal shape and the direct Gemini-API semantics. ::: ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * For `editImage`: 1–4 source images (URLs, data URLs, or Base64) ## Text-to-image (`createImage`) ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "gemini", "model": "2.5-flash", "operation": "createImage", "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting", "quantity": 1 } }] } ``` ## Image editing (`editImage`) ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "gemini", "model": "2.5-flash", "operation": "editImage", "prompt": "Make it a winter scene with snow falling", "images": [ "https://image.civitai.com/.../source.jpeg" ] } }] } ``` Pass 1–4 reference images in `images[]` — the prompt is treated as an edit instruction applied across them. ## Parameters | Field | Required | Default | Notes | |-------|----------|---------|-------| | `model` | ✅ | — | Only `2.5-flash` exposed today. | | `operation` | ✅ | — | `createImage` or `editImage`. | | `prompt` | ✅ | — | Natural-language. No explicit cap documented; keep it reasonable. | | `quantity` | | `1` | `1`–`4`. | | `images[]` | ✅ on `editImage` | — | 1–4 entries. URLs, data URLs, or Base64. | No aspect-ratio control, no resolution tier, no seed, no safety toggle. If you need any of those, switch to the [`google` engine](./google) with `nano-banana-2`. ## Reading the result Standard `imageGen` output — an `images[]` array, one per `quantity`: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageGen", "status": "succeeded", "output": { "images": [ { "id": "blob_...", "url": "https://.../signed.png" } ] } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Runtime Typical wall time 8–20 s per image including queue. `wait=60` is comfortable for `quantity: 1`–`2`; larger batches or busy periods warrant `wait=0` + polling. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Flat per-image: ``` total = 60 × quantity ``` | Shape | Buzz | |-------|------| | `createImage`, `quantity: 1` | **~60** | | `editImage` with 1 reference | ~60 | | `createImage`, `quantity: 4` | ~240 | Gemini 2.5 Flash's price doesn't depend on resolution (there's no `resolution` field) or the number of reference images. If you need the same product with tiered resolution pricing, the [`google` engine](./google)'s `nano-banana-2` is materially cheaper at 1K (~104 Buzz) and has a tiered scale for 2K / 4K. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with unknown property `aspectRatio` / `resolution` | Those fields live on the `google` engine, not `gemini` | Switch engines, or drop the field. | | `400` with "images minItems" on `editImage` | Empty `images[]` | Include at least one source image when `operation: "editImage"`. | | `400` with "images maxItems" | More than 4 source images | Trim to 4 — `google/nano-banana-2` accepts up to 10 if you need more. | | Output doesn't look edited | Prompt described target state rather than the change | Phrase as an instruction (`"Make it a winter scene"`) rather than a description of the result. | | Request timed out (`wait` expired) | Busy Gemini API queue | Resubmit with `wait=0` and poll. | | Step `failed`, `reason = "blocked"` | Google's content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Google image generation](./google) — Nano Banana / Imagen 4 via Vertex AI (alternate routing with richer controls) * [OpenAI image generation](./openai) — alternative commercial tier * [Flux 2](./flux2) / [Flux 1](./flux1) / [Qwen](./qwen) — open-weights alternatives on Civitai-hosted workers * Full parameter catalog: the `Gemini25FlashCreateImageGenInput` and `Gemini25FlashEditImageGenInput` schemas in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface --- --- url: /site/guide/getting-started.md description: Create an API token and make your first request to the Civitai site API. --- # Getting started ## 1. Generate an API token API tokens are managed from your Civitai account page: 1. Sign in to [civitai.com](https://civitai.com). 2. Open [Account settings](https://civitai.com/user/account). 3. Scroll to the **API Keys** section and click **Add API key**. 4. Give the key a name and copy the generated token — it's shown only once. Store the token somewhere safe. Treat it like a password. ## 2. Make your first request Most site API endpoints are public — you can call them without a token: ```bash curl "https://civitai.com/api/v1/models?limit=1" ``` Try it right here: Response (truncated): ```json { "items": [ { "id": 827184, "name": "WAI-illustrious-SDXL", "type": "Checkpoint", "creator": { "username": "WAI0731" }, "modelVersions": [ /* ... */ ] } ], "metadata": { "nextCursor": "75363|932023|257749", "nextPage": "https://civitai.com/api/v1/models?limit=1&cursor=..." } } ``` ## 3. Make an authenticated request Some endpoints require a token — for example `/me`, which identifies the caller: ```bash export CIVITAI_TOKEN="your-token-here" curl -H "Authorization: Bearer $CIVITAI_TOKEN" \ "https://civitai.com/api/v1/me" ``` ```json { "id": 12345, "username": "you", "tier": "founder", "status": "active", "isMember": true, "subscriptions": ["monthly"] } ``` Set a token via the **Token** button in the navbar, then try it here: A few endpoints (`GET /models` with the `favorites` or `hidden` flag, for example) also require authentication even though the base endpoint is public. See [Authentication](./authentication) for the full list. ## Next steps * [Authentication](./authentication) — token formats, query-param fallback, 401 behavior. * [Pagination](./pagination) — walking through large result sets. * [AIR identifiers](./air) — the URN format used throughout Civitai (and the Orchestration API). * [Reference](../reference/) — parameters and response fields for every endpoint. --- --- url: /orchestration/recipes/google.md --- # Google image generation Routes to Google's image-generation APIs (Vertex AI / Gemini API). Three models, selected via the `model` field: | `model` | Also known as | Notes | |---------|---------------|-------| | `nano-banana-2` | Gemini 2.5 Flash Image, next-gen | **Default** — text-to-image + image-editing via `images[]`, high-resolution tier (up to 4K), optional web/Google search grounding. | | `nano-banana-pro` | Gemini 2.5 Pro Image | Same shape as `nano-banana-2` for most purposes; pro tier for premium output. | | `imagen4` | Imagen 4 | Google's dedicated image model (not Gemini-based). Natural-language + negative prompt, fewer aspect ratios, 1K only. | **Default choice for new integrations**: `model: "nano-banana-2"`. It's fast, capable, supports editing via `images[]`, and has the widest aspect-ratio and resolution range. Step up to `nano-banana-pro` for hero-shot quality; reach for `imagen4` when you specifically want Google's older Imagen family semantics (negative prompts, stricter aspect-ratio set). ::: tip Gemini vs Google The `gemini` engine ([Gemini image generation](./gemini)) exposes the same Gemini 2.5 Flash Image product as `model: "2.5-flash"` via the direct Gemini API, with a slightly different input shape. Pick based on which API semantics you prefer — this page covers the `google` engine. ::: ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * For image editing: one or more source image URLs, data URLs, or Base64 strings (Nano Banana only — Imagen 4 is create-only) ## nano-banana-2 (default — Gemini 2.5 Flash Image) ### Text-to-image ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "google", "model": "nano-banana-2", "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting", "aspectRatio": "1:1", "resolution": "1K", "numImages": 1 } }] } ``` ### Parameters | Field | Default | Allowed | Notes | |-------|---------|---------|-------| | `prompt` | — ✅ | ≤ 50 000 chars | Natural-language, very long prompts permitted. | | `aspectRatio` | `1:1` | `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16` | | | `resolution` | `1K` | `1K` / `2K` / `4K` | Multi-resolution tier — `4K` is slower and more expensive. | | `numImages` | `1` | `1`–`4` | Nano Banana uses `numImages`, not `quantity`. | | `images[]` | *(none)* | max 10 | Passing `images[]` switches to edit mode. URLs, data URLs, or Base64. | | `seed` | random | int32 | Pin for reproducibility. | | `enableWebSearch` | `false` | boolean | Let the model ground its output in fresh web-search results. | | `enableGoogleSearch` | `false` | boolean | Let the model ground its output in Google Search results — useful for accurate depictions of real places/people/events. | ### Image editing Drop one or more source images into `images[]` and the same endpoint switches to edit mode — no separate `operation` field: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "google", "model": "nano-banana-2", "prompt": "Make it a winter scene with snow falling", "aspectRatio": "1:1", "resolution": "1K", "images": [ "https://image.civitai.com/.../source.jpeg" ] } }] } ``` Up to 10 reference images per call. Useful for prompt-driven edits and compositional blends. ### With web-search grounding `enableWebSearch` / `enableGoogleSearch` let the model pull fresh factual context into its generation. Handy for depicting real locations, current events, or brands accurately: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "google", "model": "nano-banana-2", "prompt": "A realistic photo of the Eiffel Tower at night, with accurate lighting and modern signage", "aspectRatio": "16:9", "resolution": "2K", "enableWebSearch": true } }] } ``` ## nano-banana-pro Pro-tier version of Nano Banana. Identical input shape minus the search-grounding toggles and `seed`. Reach for it when you want premium output quality on the same API: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "google", "model": "nano-banana-pro", "prompt": "A cinematic scene of a dragon perched on a mountain peak at dawn", "aspectRatio": "21:9", "resolution": "2K", "numImages": 1 } }] } ``` Same aspect-ratio / resolution enums, same `images[]` editing behaviour (up to 10 inputs). Most costly of the three Google models — use for hero shots, not bulk generation. ## imagen4 Google's dedicated Imagen 4 model. Different semantics from Nano Banana — supports `negativePrompt`, stricter aspect-ratio enum, no resolution tiers (implicit 1K), no editing: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "google", "model": "imagen4", "prompt": "A majestic fantasy landscape with floating islands, cinematic lighting", "negativePrompt": "blurry, low quality", "aspectRatio": "16:9", "numImages": 1 } }] } ``` | Field | Default | Allowed | Notes | |-------|---------|---------|-------| | `prompt` | — ✅ | ≤ 1 000 chars | Tighter than Nano Banana's 50k. | | `negativePrompt` | `""` | ≤ 1 000 chars | Imagen-specific — Nano Banana doesn't accept one. | | `aspectRatio` | `1:1` | `1:1`, `16:9`, `9:16`, `3:4`, `4:3` | Smaller set than Nano Banana. | | `numImages` | `1` | `1`–`4` | | | `seed` | random | int64 | | No editing. No `resolution` picker — outputs are always 1K. ## Reading the result All Google models emit the standard `imageGen` output: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageGen", "status": "succeeded", "output": { "images": [ { "id": "blob_...", "url": "https://.../signed.png" } ] } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Runtime Google's API queue is the dominant factor. Typical wall times: | Model / resolution | Per-image wall time | `wait` recommendation | |--------------------|---------------------|-----------------------| | `imagen4` (1K) | 8–20 s | `wait=60` fine | | `nano-banana-2` (1K) | 8–20 s | `wait=60` fine | | `nano-banana-2` (2K / 4K) | 20–60 s | `wait=60` sometimes; fall back to `wait=0` | | `nano-banana-pro` (any) | 20–60 s depending on queue | `wait=60` sometimes; `wait=0` + polling is safer | Enable `wait=0` + polling for batches above `numImages: 2`, 4K output, or Pro tier. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Flat per-image pricing by model, `resolution`, and grounding toggles: ``` total = base × numImages ``` | Model | Base (per image) | Notes | |-------|------------------|-------| | `imagen4` | **40** | Fixed; aspect ratio doesn't affect price. | | `nano-banana-2` (1K) | **104** | Default resolution tier. | | `nano-banana-2` (2K) | **156** | | | `nano-banana-2` (4K) | **208** | | | `nano-banana-pro` (1K, text-only) | **160** | | | `nano-banana-pro` (1K, with `images[]`) | **180** | Image-to-image carries a small premium. | | `nano-banana-pro` (2K, text-only) | **230** | | | `nano-banana-pro` (2K, with `images[]`) | **250** | | | `nano-banana-pro` (4K, text-only) | **320** | | | `nano-banana-pro` (4K, with `images[]`) | **340** | | **Web-search grounding** (Nano Banana 2 only) adds **+20 Buzz per image** for each flag enabled — `enableWebSearch: true` and `enableGoogleSearch: true` stack (so +40 if both on). Examples: * `imagen4`, `numImages: 1` → **~40 Buzz** * `nano-banana-2` 1K, `numImages: 1` → **~104 Buzz** * `nano-banana-2` 1K + web search, `numImages: 1` → **~124 Buzz** * `nano-banana-2` 4K, `numImages: 4` → ~832 Buzz * `nano-banana-pro` 2K text-only, `numImages: 1` → **230 Buzz**; with `images[]` → **250 Buzz** ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with unknown property `quantity` | Sent `quantity` instead of `numImages` | Google uses `numImages`; OpenAI / Flux use `quantity`. Easy to mix up. | | `400` with "aspectRatio must be one of" on Imagen 4 | Passed a Nano Banana–only ratio like `21:9` or `5:4` | Imagen 4's set is smaller — stick to `1:1`, `16:9`, `9:16`, `3:4`, `4:3`. | | `400` with "resolution is not a valid property" on Imagen 4 | Imagen 4 has no `resolution` field | Drop it — Imagen 4 is always 1K. | | `400` with "images is not a valid property" on Imagen 4 | Imagen 4 is create-only | Switch to `nano-banana-2` or `nano-banana-pro` for editing. | | `400` with "images maxItems" | More than 10 reference images on Nano Banana | Trim to 10. | | Output seems disconnected from reality (wrong year of events, nonexistent place) | No search grounding | Set `enableWebSearch: true` (or `enableGoogleSearch: true`) on `nano-banana-2`. | | Request timed out (`wait` expired) | Large `numImages`, 4K resolution, or Pro tier on busy queue | Resubmit with `wait=0` and poll. | | Step `failed`, `reason = "blocked"` | Google's content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Gemini image generation](./gemini) — Gemini 2.5 Flash Image via the direct Gemini API (alternate routing to Nano Banana) * [OpenAI image generation](./openai) — alternative commercial tier * [Flux 2](./flux2) / [Flux 1](./flux1) / [Qwen](./qwen) — open-weights alternatives on Civitai-hosted workers * [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output * [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` * Full parameter catalog: the `Imagen4ImageGenInput`, `NanoBananaProImageGenInput`, `NanoBanana2ImageGenInput` schemas in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface --- --- url: /orchestration/recipes/grok.md --- # Grok image generation Routes to xAI's Grok image API. Two operations — `createImage` and `editImage` — and a wide aspect-ratio menu (including extreme-wide and extreme-tall variants beyond what Google or OpenAI expose). ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * For `editImage`: 1–3 source images (URLs, data URLs, or Base64) ## Text-to-image (`createImage`) ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageGen", "input": { "engine": "grok", "operation": "createImage", "prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting", "aspectRatio": "1:1", "quantity": 1 } }] } ``` ### Parameters | Field | Required | Default | Notes | |-------|----------|---------|-------| | `operation` | ✅ | — | `createImage` or `editImage`. | | `prompt` | ✅ | — | Natural-language. | | `aspectRatio` | | `1:1` | See the ratio table below. | | `quantity` | | `1` | `1`–`4`. | ### Aspect ratios Grok exposes a wider aspect-ratio menu than other commercial engines: | Category | Ratios | |----------|--------| | Ultra-wide | `2:1`, `20:9`, `19.5:9`, `16:9` | | Landscape | `4:3`, `3:2` | | Square | `1:1` | | Portrait | `2:3`, `3:4` | | Vertical | `9:16`, `9:19.5`, `9:20`, `1:2` | Useful when you need phone-native vertical ratios (`9:19.5` / `9:20` match modern flagship screens) or cinema-wide output (`2:1`, `20:9`): ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "grok", "operation": "createImage", "prompt": "A sweeping cinematic view of a futuristic city skyline at dusk", "aspectRatio": "20:9", "quantity": 1 } }] } ``` ::: warning `21:9` isn't in the enum Grok's list jumps from `20:9` to `16:9` — `21:9` (the common cinema label) isn't accepted. Use `20:9` as the closest cinematic-wide option. ::: ## Image editing (`editImage`) Pass 1–3 reference images in `images[]`: ```json { "steps": [{ "$type": "imageGen", "input": { "engine": "grok", "operation": "editImage", "prompt": "Make it a winter scene with snow falling", "images": [ "https://image.civitai.com/.../source.jpeg" ] } }] } ``` Edit operations don't take an `aspectRatio` — the output resolution follows the source(s). ## Reading the result Standard `imageGen` output — an `images[]` array, one per `quantity`: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageGen", "status": "succeeded", "output": { "images": [ { "id": "blob_...", "url": "https://.../signed.png" } ] } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Runtime xAI's API queue is the dominant factor. Typical wall time 10–30 s per image. `wait=60` is comfortable for `quantity ≤ 2`; higher batches or busy periods warrant `wait=0` + polling. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Flat per-image pricing by operation: ``` total = base × quantity ``` | Operation | Base (per image) | |-----------|------------------| | `createImage` | **~26** | | `editImage` | **~29** | Examples: * `createImage`, `quantity: 1` → **~26 Buzz** * `createImage`, `quantity: 4` → ~104 Buzz * `editImage` with 1 reference, `quantity: 1` → **~29 Buzz** Aspect ratio and reference count don't affect Grok's Buzz price — the provider charges flat per-image. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "aspectRatio must be one of" | Ratio outside the accepted enum (e.g. `21:9`) | Pick a close equivalent from the table above — `20:9` is the closest cinematic wide. | | `400` with "images minItems" on edit | Empty `images[]` on `editImage` | Include 1–3 source images. | | `400` with "images maxItems" | More than 3 source images | Trim to 3. | | `400` with "quantity must be ≤ 4" | Requested more than 4 in one call | Split into multiple submissions or use a different engine with a higher cap (Flux / OpenAI gpt-image-1 / Seedream go up to 10–12). | | Request timed out (`wait` expired) | Large `quantity` or busy xAI queue | Resubmit with `wait=0` and poll. | | Step `failed`, `reason = "blocked"` | xAI content filter | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [OpenAI image generation](./openai) — alternative commercial tier * [Google image generation](./google) / [Gemini](./gemini) — alternative commercial tier * [Flux 2](./flux2) / [Qwen](./qwen) / [SDXL](./sdxl) — open-weights / sdcpp alternatives on Civitai-hosted workers * [Image upscaling](./image-upscaler) — chain after `imageGen` for higher-res output * [Prompt enhancement](./prompt-enhancement) — LLM-rewrite a prompt before feeding it in via `$ref` * Full parameter catalog: the `GrokCreateImageGenInput` and `GrokEditImageGenInput` schemas in the [API reference](/orchestration/reference/) * [`imageGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `imageGen` surface --- --- url: /orchestration/recipes/grok-video.md --- # Grok video generation xAI's Grok video model (Grok-Imagine-Video) via FAL, available through the `videoGen` step with `engine: "grok"`. ::: tip Grok image vs Grok video The `grok` engine here is for **video generation**. For Grok image generation, see the separate [`imageGen` Grok recipe](./grok). ::: Three operations: `text-to-video`, `image-to-video`, and `edit-video`. Two resolutions: 480p and 720p (default). Seven aspect ratios for text-to-video. All Grok video jobs typically run 1–4 minutes — submit with `wait=0`. ## Text-to-video ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "videoGen", "input": { "engine": "grok", "operation": "text-to-video", "prompt": "A red fox trotting through a snowy forest at dusk", "aspectRatio": "16:9", "duration": 6, "resolution": "720p" } }] } ``` ## Image-to-video Pass exactly one image in `images[]` to animate from it. Aspect ratio is inferred from the source image when `aspectRatio: "auto"`: ```json { "engine": "grok", "operation": "image-to-video", "prompt": "The subject slowly turns their head and looks toward the horizon", "images": ["https://image.civitai.com/.../photo.jpeg"], "duration": 6, "resolution": "720p", "aspectRatio": "auto" } ``` ## Portrait video Text-to-video accepts a wide aspect ratio set — use `9:16` for mobile-first content: ```json { "engine": "grok", "operation": "text-to-video", "prompt": "A person speaking passionately on stage, dynamic lighting", "aspectRatio": "9:16", "duration": 6, "resolution": "720p" } ``` ## Video editing Edit an existing video guided by a prompt. The input video is automatically resized to 854×480 and truncated to 8 seconds: ```json { "engine": "grok", "operation": "edit-video", "prompt": "Transform the scene into a vintage sepia-toned film", "videoUrl": "https://example.com/input.mp4", "duration": 6, "resolution": "720p" } ``` ::: warning Edit-video is capped at 8 s Grok truncates source videos to 8 seconds. Longer inputs are trimmed automatically. Cost is based on the analyzed duration, not the requested `duration` field. ::: ## Parameters ### Text-to-video | Field | Default | Notes | |-------|---------|-------| | `engine` | — ✅ | `"grok"` | | `operation` | `"text-to-video"` | `"text-to-video"` | | `prompt` | — ✅ | Generation prompt. | | `duration` | `6` | 1–15 seconds. | | `resolution` | `"720p"` | `"480p"` or `"720p"` | | `aspectRatio` | `"16:9"` | `"16:9"`, `"4:3"`, `"3:2"`, `"1:1"`, `"2:3"`, `"3:4"`, `"9:16"` | ### Image-to-video Same as text-to-video plus: | Field | Default | Notes | |-------|---------|-------| | `operation` | — ✅ | `"image-to-video"` | | `images[]` | — ✅ | Exactly 1 image (URL, data URL, or Base64). | | `aspectRatio` | `"auto"` | `"auto"` infers ratio from the source image. Explicit ratios also accepted. | ### Edit-video | Field | Default | Notes | |-------|---------|-------| | `operation` | — ✅ | `"edit-video"` | | `videoUrl` | — ✅ | Source video URL. Resized to 854×480, capped at 8 s. | | `duration` | `6` | Informational for the request; actual duration determined by the source video length (up to 8 s). | | `resolution` | `"720p"` | `"480p"` or `"720p"` | ## Cost Per-second pricing; `total = costPerSecond × duration`. | Operation | Resolution | Buzz/s | Example — 6 s | |-----------|------------|--------|---------------| | `text-to-video` / `image-to-video` | `480p` | 65 | **390** | | `text-to-video` / `image-to-video` | `720p` | 91 | **546** | | `edit-video` | `480p` | 78 | **468** | | `edit-video` | `720p` | 104 | **624** | For `edit-video`, cost uses the **analyzed source video duration** (capped at 8 s), not the `duration` field in the request. ## Reading the result ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "videoGen", "status": "succeeded", "output": { "video": { "id": "blob_...", "url": "https://.../signed.mp4" } } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Long-running jobs Grok video typically completes in 1–4 minutes. Use `wait=0` + polling or webhooks: * **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks) * **Polling**: `GET /v2/consumer/workflows/{workflowId}` every 10–30 s ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "images must have exactly 1 item" | Sent 0 or 2+ images to `image-to-video` | Image-to-video requires exactly 1 source image. | | `400` with "videoUrl is required" | Missing `videoUrl` on `edit-video` | Provide the source video URL. | | `400` with "aspectRatio must be one of" on image-to-video | Sent an unsupported ratio | Image-to-video additionally accepts `"auto"` but has the same seven explicit ratios as t2v. | | Cost higher than expected on edit-video | Source video longer than requested duration | Input is truncated to 8 s; cost is based on the actual analyzed length. | | Step `failed`, `reason = "no_provider_available"` | No Grok worker available | Retry shortly. | | Step `failed`, `reason = "blocked"` | xAI content policy | Don't retry the same input. | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling * [Grok image generation](./grok) — Grok for images * [Kling video generation](./kling) — comparable commercial video model * [Veo 3 video generation](./veo3) — Google's video model --- --- url: /orchestration/recipes/happy-horse.md --- # Happy-Horse video generation Alibaba's Happy-Horse video model, served through FAL. Four operations cover the common video workflows: text-to-video, image-to-video, video-to-video editing, and multi-character reference generation. The operation is selected by an explicit `operation` discriminator — fields invalid for that operation are rejected with a `400`. | `operation` | Required inputs | What it does | |---|---|---| | `textToVideo` | `prompt` | Generate a clip from a text prompt. | | `imageToVideo` | `image` | Animate a single source image as the first frame. | | `videoEdit` | `sourceVideo`, `prompt` | Re-paint or restyle an existing clip; optional reference images guide the look. | | `referenceToVideo` | `prompt`, `images` (1–9) | Subject-consistent generation using up to 9 character references. Cite them as `character1`…`character9` in the prompt. | **Default choice**: `version: "v1.0"`, `resolution: "1080p"`, `duration: 5`. All Happy-Horse jobs exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — always submit with `wait=0`. ## The request shape Every Happy-Horse request is a single `videoGen` step on [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). Three keys select which leaf schema the rest of the body is validated against: ```json { "$type": "videoGen", "input": { "engine": "happyHorse", "version": "v1.0", "operation": "textToVideo" } } ``` ### Source-media inputs `videoEdit` accepts `sourceVideo` as either: * a Civitai AIR URN (`urn:air:…`), or * a civitai-hosted URL (`image.civitai.com`, orchestrator blob URLs, civitai-managed R2 / B2 / Spaces). Arbitrary third-party URLs are **not** fetched — requests that pass one are rejected with a `400`. Upload the video to Civitai first and pass the resulting URL. `image`, `images`, and `referenceImages` go through the image pipeline and *do* accept external URLs — only `sourceVideo` has this restriction. ## textToVideo ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "videoGen", "input": { "engine": "happyHorse", "version": "v1.0", "operation": "textToVideo", "prompt": "A little girl walking on a road at sunset, cinematic lighting, smooth camera movement", "aspectRatio": "16:9", "resolution": "1080p", "duration": 5 } }] } ``` ## imageToVideo Pass a single image as the first frame; `prompt` becomes optional and only steers the motion. ```json { "engine": "happyHorse", "version": "v1.0", "operation": "imageToVideo", "prompt": "Camera slowly pushes in", "image": "https://image.civitai.com/.../first-frame.jpeg", "resolution": "1080p", "duration": 5 } ``` `aspectRatio` is **not** accepted here — output dimensions are derived from the input image. Source images must be at least 300px on the short side, ≤10 MB, and within a 1:2.5–2.5:1 aspect range. ## videoEdit Re-paint or restyle an existing clip. The output duration matches the source; `duration` on the request applies to the cost preview only. ```json { "engine": "happyHorse", "version": "v1.0", "operation": "videoEdit", "prompt": "Repaint the scene in vibrant anime style; reference @Image1 for the character outfit", "sourceVideo": "https://image.civitai.com/.../clip.webm", "referenceImages": [ "https://image.civitai.com/.../style.jpeg" ], "audioSetting": "auto", "resolution": "1080p" } ``` * `referenceImages` is optional — pass 0–5 images. Cite them in the prompt as `@Image1`–`@Image5`. * `audioSetting`: `"auto"` regenerates a soundtrack to match the edit; `"origin"` keeps the source audio intact. * FAL bills both the input *and* the output second on this operation, so the per-second rate is double the other modes — see [Cost](#cost). ## referenceToVideo Generate with 1–9 character references. Cite each in the prompt with `character1`, `character2`, … `character9`. ```json { "engine": "happyHorse", "version": "v1.0", "operation": "referenceToVideo", "prompt": "character1 and character2 walk together through a neon-lit alley", "images": [ "https://image.civitai.com/.../subject-a.jpeg", "https://image.civitai.com/.../subject-b.jpeg" ], "aspectRatio": "16:9", "resolution": "1080p", "duration": 5 } ``` Reference images must be ≥400 px on the short side and ≤10 MB each. ## Parameters Shared across operations unless noted. The per-operation schema in the [API reference](/orchestration/reference/) is authoritative. | Field | Default | Used by | Notes | |---|---|---|---| | `engine` | — ✅ | All | `"happyHorse"` | | `version` | — ✅ | All | `"v1.0"` | | `operation` | — ✅ | All | See the table above. | | `prompt` | — ✅ | All (optional on `imageToVideo`) | Up to 2500 characters. | | `resolution` | `"1080p"` | All | `"720p"` or `"1080p"`. | | `duration` | `5` | All except `videoEdit`'s output | Integer seconds, 3–15. `videoEdit` clamps output to the source video's length. | | `aspectRatio` | `"16:9"` | `textToVideo`, `referenceToVideo` | `"16:9"`, `"9:16"`, `"1:1"`, `"4:3"`, `"3:4"`. | | `image` | — ✅ | `imageToVideo` | Single image used as the first frame. | | `sourceVideo` | — ✅ | `videoEdit` | Civitai-hosted URL or AIR URN — not arbitrary external. | | `referenceImages[]` | `[]` | `videoEdit` | 0–5 images. | | `audioSetting` | `"auto"` | `videoEdit` | `"auto"` regenerates audio, `"origin"` preserves it. | | `images[]` | — ✅ | `referenceToVideo` | 1–9 character references. | | `seed` | random | All | Integer for reproducibility, 0–2147483647. | | `enableSafetyChecker` | `true` | All | Disable only when you have your own moderation. | ## Cost Billed per output second in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. ``` total = buzzPerSecond × duration ``` | Operation | 720p | 1080p | |---|---|---| | `textToVideo`, `imageToVideo`, `referenceToVideo` | **182** Buzz/s | **364** Buzz/s | | `videoEdit` | **364** Buzz/s | **728** Buzz/s | Example totals at `duration: 5`: | Operation | 720p | 1080p | |---|---|---| | `textToVideo` / `imageToVideo` / `referenceToVideo` | **910** | **1 820** | | `videoEdit` | **1 820** | **3 640** | `videoEdit` is double the others because FAL bills both the input second and the output second — already encoded in the rate above. ## Reading the result Same as any `videoGen` step — a single `video` blob: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "videoGen", "status": "succeeded", "output": { "video": { "id": "blob_...", "url": "https://.../signed.mp4" } } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Long-running jobs Happy-Horse jobs typically complete in 2–6 minutes (longer for `videoEdit` and 1080p). All exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — submit with `wait=0` and: * **Webhooks** (recommended): register a callback with `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks). * **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 10 s → 30 s → 60 s cadence. ## Troubleshooting | Symptom | Likely cause | Fix | |---|---|---| | `400` with unknown field | Field isn't valid for this `operation` | Each operation maps to its own typed schema (`HappyHorseV1Input`); check it via [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). | | `400` "`sourceVideo` must be a Civitai AIR URN…" | Passed an external URL to `sourceVideo` | Re-upload the video to Civitai and use the resulting URL, or pass a `urn:air:…` URN. | | `400` "referenceToVideo requires between 1 and 9 reference images" | `images` was empty or had >9 entries | Provide 1–9 images. | | `400` "videoEdit accepts at most 5 reference images" | `referenceImages` had >5 entries | Trim to 5. | | Step `failed`, `reason = "no_provider_available"` | FAL queue busy | Retry shortly. | | Step `failed`, `reason = "blocked"` | Safety checker rejected input/output | Re-prompt; if you've handled moderation upstream, set `enableSafetyChecker: false`. | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling * [Veo 3 video generation](./veo3) — comparable commercial multi-mode video model * [Kling video generation](./kling) — another commercial multi-mode video model * Full parameter catalog: the `HappyHorseV1Input` schemas in the [API reference](/orchestration/reference/) * [`videoGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `videoGen` surface --- --- url: /orchestration/recipes/hunyuan.md --- # HunyuanVideo generation Tencent's HunyuanVideo open model, running on Civitai's Comfy workers. Text-to-video with LoRA support for custom subjects, styles, and motions. ```json { "$type": "videoGen", "input": { "engine": "hunyuan", "prompt": "...", "width": 854, "height": 480, "duration": 5 } } ``` HunyuanVideo is compute-intensive — always submit with `wait=0`. ## Text-to-video ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "videoGen", "input": { "engine": "hunyuan", "prompt": "A majestic waterfall cascading down mossy rocks in a lush rainforest, slow motion", "width": 854, "height": 480, "duration": 5, "fps": 24, "steps": 40 } }] } ``` ## With LoRAs Attach community LoRAs to bias subject, style, or motion. Format: `{ "air": "", "strength": 0.0–1.0 }`: ```json { "engine": "hunyuan", "prompt": "A character from the LoRA walking through a neon-lit city at night", "width": 854, "height": 480, "duration": 5, "fps": 24, "steps": 40, "loras": [ { "air": "urn:air:hyv1:lora:civitai:123456@789012", "strength": 0.8 } ] } ``` ## Using a custom model checkpoint The default model is the base HunyuanVideo checkpoint. Override with any Civitai-hosted HunyuanVideo checkpoint using its AIR URN: ```json { "engine": "hunyuan", "model": "urn:air:hyv1:checkpoint:civitai:@", "prompt": "...", "width": 854, "height": 480, "duration": 5 } ``` ## Parameters | Field | Default | Notes | |-------|---------|-------| | `engine` | — ✅ | `"hunyuan"` | | `prompt` | — ✅ | Generation prompt. | | `model` | *(base HunyuanVideo)* | AIR URN for an alternative checkpoint. | | `width` | `480` | Output width in pixels. Larger → slower and more expensive. | | `height` | `480` | Output height in pixels. | | `duration` | `5` | 1–30 seconds. | | `fps` | `25` | Frame rate. Common values: `24`, `25`, `30`. | | `steps` | `40` | 10–50 diffusion steps. More steps = higher quality, longer runtime. | | `cfgScale` | `4` | 0–100. Guidance scale — lower is more creative. | | `loras[]` | `[]` | Array of `{ air, strength }` LoRA attachments. | | `seed` | random | Integer for reproducibility. | ## Recommended resolutions | Use case | `width` | `height` | Notes | |----------|---------|----------|-------| | Fast / prototype | `480` | `480` | Square, minimal cost. | | Landscape 480p | `854` | `480` | 16:9, good balance. | | Portrait 480p | `480` | `854` | 9:16 for mobile. | | Landscape 720p | `1280` | `720` | High quality; significantly slower. | ::: tip Resolution and cost Cost scales approximately with pixel count × duration × steps. Doubling the resolution (~4× pixel area) increases cost roughly 4×. Use `whatif=true` to preview exact cost before committing. ::: ## Cost HunyuanVideo cost depends on `width × height × duration × fps × steps`. The formula uses GPU-second estimation with a 5× markup, rounded to the nearest 25 Buzz (minimum 100 Buzz). Use `whatif=true` to get an exact preview: ```bash curl -s "https://orchestration.civitai.com/v2/consumer/workflows?whatif=true" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{"steps":[{"$type":"videoGen","input":{"engine":"hunyuan","prompt":"...","width":854,"height":480,"duration":5,"steps":40}}]}' ``` Approximate ranges (854×480, 24 fps): | Duration | Steps | Approx. Buzz | |----------|-------|--------------| | 3 s | 20 | ~200–400 | | 5 s | 40 | ~500–1 000 | | 10 s | 40 | ~1 000–2 000 | Actual cost varies with GPU load and model. ## Reading the result ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "videoGen", "status": "succeeded", "output": { "video": { "id": "blob_...", "url": "https://.../signed.mp4" } } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Long-running jobs HunyuanVideo is compute-heavy. Expect 5–30 minutes depending on resolution, duration, and steps. Use `wait=0` + polling or webhooks: * **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks) * **Polling**: `GET /v2/consumer/workflows/{workflowId}` every 30–60 s ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "steps out of range" | Value outside 10–50 | Clamp to 10–50. | | `400` with "duration out of range" | Value outside 1–30 | Clamp to 1–30. | | Very long queue wait | Large resolution / many steps | Reduce `width`/`height` or `steps` for iteration; scale up for final renders. | | Step `failed`, `reason = "no_provider_available"` | No Comfy worker with HunyuanVideo warm | Retry shortly. | | Output looks blurry at high resolution | Too few steps | Increase `steps` to 40–50 for larger resolutions. | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling * [LTX2 video generation](./ltx2) — another Comfy-based open video model, generally faster * [WAN video generation](./wan) — another Comfy/FAL open video model with broad operation support --- --- url: /orchestration/recipes/convert-image.md --- # Image conversion `convertImage` is a utility step for format conversion, resizing, and blurring. It applies zero or more transforms in order, then encodes the result to the requested format. Cost is a flat **1 Buzz** regardless of image size, number of transforms, or output format. ## The request shape ```json { "$type": "convertImage", "input": { "image": "https://...", // source — URL, data URL, or Base64 "transforms": [ /* optional */ ], // resize / blur — applied in order "output": { "format": "jpeg" } // required — format + per-format settings } } ``` `transforms` is optional (omit to change format or settings only). `output` is required. ## Examples ### Format conversion Convert any image to JPEG: ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "convertImage", "input": { "image": "https://image.civitai.com/.../source.png", "output": { "format": "jpeg", "quality": 85 } } }] } ``` ### Resize then convert Resize to a target width (aspect ratio preserved) and encode to WebP: ```json { "$type": "convertImage", "input": { "image": "https://...", "transforms": [{ "type": "resize", "targetWidth": 512 }], "output": { "format": "webp", "quality": 85 } } } ``` ### Region blur Blur one or more rectangular areas — useful for privacy masking: ```json { "$type": "convertImage", "input": { "image": "https://...", "transforms": [{ "type": "blur", "blur": 60, "mode": "include", "regions": [ { "x1": 50, "y1": 50, "x2": 400, "y2": 400 } ] }], "output": { "format": "jpeg", "quality": 85 } } } ``` `mode: "include"` blurs only inside the regions; the rest stays sharp. `mode: "exclude"` blurs everything *except* the regions — use it to protect a subject while blurring the background. ### Full-image blur `mode: "exclude"` with an empty `regions` array blurs the entire image (nothing is excluded): ```json { "type": "blur", "blur": 40, "mode": "exclude", "regions": [] } ``` ### PNG with metadata stripped ```json { "$type": "convertImage", "input": { "image": "https://...", "output": { "format": "png", "hideMetadata": true } } } ``` ## Transforms reference Transforms run in array order. You can chain multiple transforms — for example, resize first and then blur. ### `resize` | Field | Default | Notes | |-------|---------|-------| | `type` | — ✅ | Must be `"resize"`. | | `targetWidth` | *(none)* | Target width in pixels, 1–4096. Height is calculated to preserve aspect ratio. | ### `blur` | Field | Default | Notes | |-------|---------|-------| | `type` | — ✅ | Must be `"blur"`. | | `blur` | — ✅ | Gaussian blur intensity, 1–100. | | `mode` | — ✅ | `"include"` — blur only inside regions. `"exclude"` — blur everywhere except regions. | | `regions` | `[]` | Pixel-coordinate rectangles `{ x1, y1, x2, y2 }`. With `mode: "exclude"` and no regions, the entire image is blurred. With `mode: "include"` and no regions, nothing is blurred. | ## Output formats reference ### `jpeg` | Field | Default | Notes | |-------|---------|-------| | `format` | — ✅ | `"jpeg"` | | `quality` | `85` | 1–100. Higher = better quality, larger file. | | `hideMetadata` | `false` | Strip EXIF and other metadata. | ### `png` | Field | Default | Notes | |-------|---------|-------| | `format` | — ✅ | `"png"` | | `hideMetadata` | `false` | Strip metadata. | PNG is lossless — no quality setting. ### `webp` | Field | Default | Notes | |-------|---------|-------| | `format` | — ✅ | `"webp"` | | `quality` | `85` | 1–100. Applies only when `lossless: false`. | | `lossless` | `false` | Enable lossless WebP compression. | | `maxFrames` | `null` | Cap frame count for animated sources. Set to `1` to extract only the first frame. | | `hideMetadata` | `false` | Strip metadata. | ### `gif` | Field | Default | Notes | |-------|---------|-------| | `format` | — ✅ | `"gif"` | | `maxFrames` | `null` | Cap frame count. Set to `1` to extract the first frame. | | `hideMetadata` | `false` | Strip metadata. | ::: tip JPEG and PNG with animated sources JPEG and PNG are inherently single-frame. Animated source images (GIF, animated WebP) are automatically reduced to the first frame when encoding to these formats — there is no `maxFrames` field to set. Use WebP or GIF output to preserve animation. ::: ## Reading the result ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "convertImage", "status": "succeeded", "output": { "blob": { "id": "blob_...", "url": "https://.../signed.jpg", "width": 512, "height": 342 } } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ::: tip Result caching `convertImage` is deterministic: the same source image, transforms, and output settings always produce the same blob. The orchestrator caches the result, so repeated identical calls skip re-processing and return the cached blob immediately. ::: ## Cost Flat **1 Buzz** per step — regardless of source image size, number of transforms, or output format. ## Chaining with other steps `convertImage` is most useful as a post-processing step. Chain it after `imageGen` using `$ref` to reference the previous step's output: ```json { "steps": [ { "name": "gen", "$type": "imageGen", "input": { "engine": "flux2", "prompt": "A photorealistic cat sitting in a sunny garden" } }, { "name": "convert", "$type": "convertImage", "input": { "image": { "$ref": "gen.output.images[0].url" }, "transforms": [{ "type": "resize", "targetWidth": 1024 }], "output": { "format": "webp", "quality": 90 } } } ] } ``` ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "output is required" | Missing `output` field | `output` is always required — include at minimum `{ "format": "jpeg" }`. | | `400` with "targetWidth out of range" | Value outside 1–4096 | Clamp to 1–4096. | | `400` with "blur out of range" | Value outside 1–100 | Clamp to 1–100. | | `400` with "mode is required" | Blur transform sent without `mode` | `mode` is required on `blur` — set `"include"` or `"exclude"`. | | Output height different from expected | `resize` maintains aspect ratio | Only `targetWidth` is specified; height is derived from the original aspect ratio. | | Animated source collapsed to one frame | JPEG or PNG output requested | These formats are single-frame; use WebP or GIF output to preserve animation. | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Image upscaling](./image-upscaler) — chain upscaling before `convertImage` for high-res output in a target format * [Prompt enhancement](./prompt-enhancement) — another 1-Buzz utility step --- --- url: /orchestration/recipes/image-upscaler.md --- # Image upscaling The `imageUpscaler` step type takes an image and returns a higher-resolution version. The **upscaler model** sets the scale factor per pass (a "4×" model like [4x-Remacri](https://civitai.com/models/147759/remacri?modelVersionId=164821) — the default — applies a 4× enlargement in one run). You can then run the same model up to 3 times in one step via `numberOfRepeats` for compounding scale. Common uses: * Finishing step after image generation (chain `imageGen` → `imageUpscaler`) * Rescuing low-resolution assets * Preparing images for print / large-format display ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A source image — URL, data URL, or Base64 string ## The simplest request Use the per-recipe endpoint when you're just upscaling one image and don't need webhooks or multi-step chaining: ```http POST https://orchestration.civitai.com/v2/consumer/recipes/imageUpscaler?wait=60 Authorization: Bearer Content-Type: application/json { "image": "https://image.civitai.com/.../00890-23.jpeg" } ``` That's it — the defaults run the 4x-Remacri upscaler once. The response is a full [`Workflow`](/orchestration/reference/operations/GetWorkflow) whose single step carries the upscaled blob. ## Via the generic workflow endpoint Equivalent request through [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — use this path when you need webhooks, tags, or to chain with other steps: ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "imageUpscaler", "input": { "image": "https://image.civitai.com/.../00890-23.jpeg", "numberOfRepeats": 2 } }] } ``` ## Input fields See the [`ImageUpscalerInput` schema](/orchestration/reference/operations/InvokeImageUpscalerStepTemplate) for the full definition. | Field | Required | Default | Notes | |-------|----------|---------|-------| | `image` | ✅ | — | URL, data URL, or raw Base64 string. Civitai CDN URLs work directly. | | `model` | | [`4x-Remacri`](https://civitai.com/models/147759/remacri?modelVersionId=164821) (`urn:air:other:upscaler:civitai:147759@164821`) | AIR URN of the upscaler model. The model's own spec determines the **scale factor per pass**. | | `numberOfRepeats` | | `1` | `1`–`3`. How many times to run the model end-to-end. Total scale ≈ `(model_scale) ^ numberOfRepeats`. | ### Picking a model Two dimensions to consider: **Content type** — different upscaler families handle different content best: * **Photographic / real-world images** — general-purpose upscalers (ESRGAN derivatives like 4x-Remacri, the default). * **Anime / illustrated art** — anime-tuned upscalers produce cleaner line work. * **Faces / portraits** — face-restoration–aware upscalers reduce artifacts around features. **Scale factor** — upscaler models advertise their scale in the name (`2x-…`, `4x-…`, `8x-…`). This is typically the multiplication factor per pass — a `4x` model on a 1024×1024 input produces 4096×4096 output in a single run. Combined with `numberOfRepeats: 2`, a 4× model produces a 16× total enlargement. Browse [Civitai's upscaler catalog](https://civitai.com/models?tag=upscaler) and pass the AIR URN you want. Leave `model` unset to accept 4x-Remacri. ## Chaining: generate then upscale One of the most common two-step workflows — produce at native resolution, then upscale with a single submission: ```json { "steps": [ { "$type": "imageGen", "name": "hero", "input": { "engine": "flux2", "model": "klein", "operation": "createImage", "modelVersion": "4b", "prompt": "A cat astronaut floating through neon space", "width": 1024, "height": 1024 } }, { "$type": "imageUpscaler", "name": "hero-4k", "input": { "image": { "$ref": "hero", "path": "output.images[0].url" }, "numberOfRepeats": 1 } } ] } ``` The `{ "$ref": "hero", "path": "output.images[0].url" }` reference creates a dependency — `hero-4k` doesn't start until `hero` succeeds, and the upscaler's `image` field is filled in with the generated image's signed URL at runtime. See [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) for the full reference syntax. ## Targeting an exact resolution Upscalers only know how to multiply (4× per pass with the default model). If you need a specific output width — say, 1920 px wide for a hero image — chain a `convertImage` step after the upscaler to downscale to your exact target. ```json { "steps": [ { "$type": "imageGen", "name": "hero", "input": { "engine": "flux2", "model": "klein", "operation": "createImage", "modelVersion": "4b", "prompt": "A cat astronaut floating through neon space", "width": 1024, "height": 1024 } }, { "$type": "imageUpscaler", "name": "upscaled", "input": { "image": { "$ref": "hero", "path": "output.images[0].url" }, "numberOfRepeats": 1 } }, { "$type": "convertImage", "name": "hero-1920", "input": { "image": { "$ref": "upscaled", "path": "output.blob.url" }, "transforms": [ { "type": "resize", "targetWidth": 1920 } ], "output": { "format": "webp", "quality": 85, "lossless": false, "hideMetadata": true } } } ] } ``` What happens at runtime: 1. **`hero`** generates a 1024×1024 image. 2. **`upscaled`** runs 4x-Remacri once → 4096×4096 (intermediate, oversized). 3. **`hero-1920`** downsamples to 1920 px wide (height auto-computed from aspect ratio = 1920×1920 here) and re-encodes as WebP at quality 85. `ResizeTransform` keeps aspect ratio — set only `targetWidth` (1–4096). For other format / quality knobs see the [`ConvertImageInput` schema](/orchestration/reference/operations/InvokeConvertImageStepTemplate); supported `format` values are `jpeg`, `png`, `webp`, `gif`. ## Reading the result A successful `imageUpscaler` step emits a single upscaled image blob: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "imageUpscaler", "status": "succeeded", "output": { "blob": { "id": "blob_...", "url": "https://.../signed.png" } } }] } ``` Note: `imageUpscaler` output is `blob` (singular), not `blobs[]` — the step always returns exactly one image. Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) to get a fresh URL. ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. Cost scales with input pixel area and the total scale factor applied by `numberOfRepeats`: ``` inputMegapixels = width × height / 1 000 000 scale = 2 ^ numberOfRepeats // default 4x-Remacri, 2 per pass total = max(1, ceil(inputMegapixels)) × scale ``` | Shape | Buzz | |-------|------| | 512×512 input, `numberOfRepeats: 1` | **2** | | 1024×1024 input, `numberOfRepeats: 1` | **4** | | 2048×2048 input, `numberOfRepeats: 1` | **10** | | 1024×1024 input, `numberOfRepeats: 2` | **8** | | 1024×1024 input, `numberOfRepeats: 3` | **16** | Upscaling is one of the cheapest operations exposed — even aggressive stacked passes on a 2-megapixel source land under a few dozen Buzz. The practical ceiling is usually the [upscaler's content-size cap](#runtime), not cost. ## Runtime A single pass (`numberOfRepeats: 1`) with the default 4x-Remacri on a ~1-megapixel input usually completes in 5–15 s and fits inside `wait=60`. Multiple repeats stack both runtime *and* output size — `numberOfRepeats: 3` with a 4× model produces a 64× enlargement, which will exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) and is rarely what you actually want. Use `wait=0` plus webhooks/polling for anything beyond one pass, and keep the total scale in mind before cranking `numberOfRepeats`. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "image could not be loaded" | URL not publicly reachable, or data URL malformed | Make sure the URL is fetchable without auth; re-encode the Base64 payload. | | `400` with "numberOfRepeats out of range" | Value outside `1`–`3` | Clamp client-side. | | Output looks soft / painterly | Default model mismatch for this content | Specify a content-appropriate `model` AIR (anime-tuned for illustration, face-aware for portraits, etc.). | | Output has halos or ringing | `numberOfRepeats` too aggressive for the source | Drop to a single pass; or pre-denoise the source. | | Step `failed`, `reason = "blocked"` | Source image hit content moderation | Don't retry the same input — see [Errors & retries → Step-level failures](/orchestration/guide/errors-and-retries#step-level-failures). | ## Related * [`InvokeImageUpscalerStepTemplate`](/orchestration/reference/operations/InvokeImageUpscalerStepTemplate) — the per-recipe endpoint * [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/imageUpscaler/openapi.yaml) — standalone OpenAPI 3.1 YAML for this endpoint, ready to import into Postman / Insomnia / OpenAPI Generator * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — generic path for chaining * [Video upscaling](./video-upscaler) — the `videoUpscaler` equivalent for video * [Workflows → Dependencies](/orchestration/guide/workflows#dependencies-parallelism) — how the `@step.output.*` references work --- --- url: /site/reference/images.md description: >- Browse images posted to Civitai, with filters for post, model, version, and creator. --- # Images Images are user-submitted outputs attached to posts. This endpoint powers the gallery on civitai.com. ## List images ``` GET /api/v1/images ``` **Auth:** Public. Authenticated callers see content up to their configured browsing level; anonymous callers are capped at the public browsing level. ### Query parameters | Name | Type | Default | Description | |------|------|---------|-------------| | `limit` | integer (0–200) | 50 | Number of items per page. | | `page` | integer | — | 1-indexed page number. Incompatible with `cursor`. | | `cursor` | string | — | Opaque cursor; use `metadata.nextCursor` from the previous response. | | `postId` | integer | — | Restrict to a specific post. | | `modelId` | integer | — | Images associated with any version of a model. | | `modelVersionId` | integer | — | Images associated with a specific version. | | `imageId` | integer | — | Single-image lookup. | | `username` | string | — | Filter by uploader username. Auto-slugified. | | `userId` | integer | — | Filter by uploader user ID. | | `period` | `AllTime` | `Year` | `Month` | `Week` | `Day` | `AllTime` | Time window for sort metrics. | | `sort` | `Most Reactions` | `Most Comments` | `Most Collected` | `Newest` | `Oldest` | `Random` | `Most Reactions` | | | `nsfw` | `None` | `Soft` | `Mature` | `X` | boolean | — | Legacy NSFW filter; prefer `browsingLevel`. | | `browsingLevel` | integer (bitmask) | — | Raw browsing-level bitmask. Takes precedence over `nsfw`. | | `tags` | comma-separated integers | — | Tag IDs to require on each image. | | `type` | `image` | `video` | `audio` | — | Media type. | | `baseModels` | comma-separated strings | — | Filter to outputs from specific base models. | | `withMeta` | boolean | `false` | If `true`, include the full `meta` object (prompt, resources, etc.). | ### Response ```json { "items": [ { "id": 9173928, "url": "https://image.civitai.com/.../cc242d6c-f960-4274-aa1d-f22a71e705ef.jpeg", "hash": "UA8N5},:Ioni~C#laKxaoznNwvx]XmRkVstR", "width": 832, "height": 1216, "type": "image", "nsfw": true, "nsfwLevel": "Soft", "browsingLevel": 2, "createdAt": "2025-04-17T21:28:57.225Z", "postId": 1981754, "username": "Ajuro", "baseModel": "SDXL 1.0", "modelVersionIds": [9208, 249861, 258687, 332071, 345685], "stats": { "cryCount": 1770, "laughCount": 2771, "likeCount": 21692, "dislikeCount": 0, "heartCount": 8044, "commentCount": 58 }, "meta": { "Size": "832x1216", "seed": 1938345220, "steps": 45, "sampler": "DPM++ 2M", "cfgScale": 5, "clipSkip": 2, "prompt": "...", "negativePrompt": "...", "resources": [], "civitaiResources": [ { "type": "checkpoint", "modelVersionId": 345685 }, { "type": "lora", "weight": 0.65, "modelVersionId": 249861 } ] } } ], "metadata": { "nextCursor": "1|1744925337225", "nextPage": "https://civitai.com/api/v1/images?limit=100&cursor=..." } } ``` ### Field notes * `nsfwLevel` is the **string** form (`None`, `Soft`, `Mature`, `X`). `browsingLevel` is the raw bitmask — use this for precise filtering. * `hash` is a BlurHash, suitable for rendering a placeholder while the `url` loads. * `meta` is present only when the uploader included metadata at post time. The most common fields are listed above, but the object is free-form — tools like Automatic1111 and ComfyUI drop in their own keys. Treat unknown keys as opaque. * `civitaiResources` inside `meta` maps each referenced resource to its Civitai `modelVersionId`, so you can round-trip back to [`GET /model-versions/{id}`](./model-versions). * `modelVersionIds` at the top level is a deduped list of every model version referenced in `meta.civitaiResources`. ### Notes * Page-based pagination is capped at `page * limit ≤ 1000`; deep traversal requires `cursor`. See [Pagination](../guide/pagination). * On Civitai's "green" domain or from restricted regions, results are filtered to SFW regardless of the `nsfw` / `browsingLevel` parameter. * `/images` defaults to `limit=50`. Lower it explicitly if you're only after a handful, or raise it up to `200` for fewer round-trips. ### Examples ```bash # Newest images for a specific model curl "https://civitai.com/api/v1/images?modelId=827184&sort=Newest&limit=10" # All images in a post, with full generation metadata curl "https://civitai.com/api/v1/images?postId=1981754&withMeta=true" # Cursor-based traversal curl "https://civitai.com/api/v1/images?limit=100" | jq '.metadata.nextCursor' ``` ::: warning Filtering by `modelId` on an extremely popular checkpoint (hundreds of thousands of images) can exceed Cloudflare's 30s timeout. For large models, fetch by `postId` or walk `cursor`-based pagination with `limit=100` instead of sorting the whole set. ::: --- --- url: /orchestration/guide.md --- # Introduction The Civitai Orchestrator is an API for running AI workloads — video generation, image generation, upscaling, transcription, text-to-speech, and more — without managing the underlying infrastructure. You submit a **workflow**: a small JSON document describing what you want done. The orchestrator: 1. Converts workflow steps into **jobs** 2. Races multiple **providers** (FAL, Google, Bytedance, Civitai workers, and others) to claim each job 3. Streams results back — blobs (images/video/audio), text, or structured output You get a single contract. The orchestrator handles provider selection, capacity, retries, and capability matching behind it. ## When to use this API * You want to generate or transform media (video, image, audio, 3D) at scale * You want provider redundancy without writing provider-specific code * You want job tracking, webhooks, and resumable workflows out of the box * You already have an AIR (Civitai resource identifier) and want to run inference against it ## Next steps * [Quick start](./getting-started) — your first request in 5 minutes * [Recipes](/orchestration/recipes/) — end-to-end examples (WAN video, Flux images, upscaling…) * [API reference](/orchestration/reference/) — every operation, schema, and response --- --- url: /orchestration/recipes/kling.md --- # Kling video generation Kuaishou's Kling model family, available in two generations through the `videoGen` step: | `engine` | Models | Notes | |----------|--------|-------| | `kling` | `v1`, `v1.5`, `v1.6`, `v2`, `v2.5-turbo` | Original Kling. Text-to-video and image-to-video. | | `kling-v3` | *(version-agnostic)* | Kling V3. Five operations including video-to-video and reference-to-video. Duration in seconds (3–15). | **Default choice for new integrations**: `engine: "kling-v3"` with `operation: "text-to-video"`. For speed + cost, use `mode: "Standard"`; for highest quality, `mode: "Professional"`. All Kling jobs exceed the [100-second timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — always submit with `wait=0` and handle results via webhooks or polling. ## Kling (original) ### Text-to-video ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "videoGen", "input": { "engine": "kling", "model": "v2.5-turbo", "prompt": "A serene mountain lake at dawn with mist rolling over the water", "aspectRatio": "16:9", "duration": "5" } }] } ``` ### Image-to-video Pass `sourceImage` (URL, data URL, or Base64) to animate a start frame: ```json { "engine": "kling", "model": "v1.6", "prompt": "The subject slowly turns to face the camera", "sourceImage": "https://image.civitai.com/.../photo.jpeg", "aspectRatio": "16:9", "duration": "5", "mode": "Standard" } ``` ### Parameters | Field | Default | Notes | |-------|---------|-------| | `engine` | — ✅ | `"kling"` | | `model` | — ✅ | `"v1"` / `"v1.5"` / `"v1.6"` / `"v2"` / `"v2.5-turbo"` | | `prompt` | — ✅ | Generation prompt. | | `negativePrompt` | `null` | What to avoid. | | `mode` | `"Standard"` | `"Standard"` or `"Professional"`. Affects quality and cost for v1/v1.5/v1.6. Ignored for v2/v2.5-turbo. | | `aspectRatio` | `"16:9"` | `"16:9"`, `"9:16"`, `"1:1"` | | `duration` | `"5"` | `"5"` or `"10"` (seconds). String enum. | | `cfgScale` | `0.5` | 0–1. Prompt adherence. | | `sourceImage` | `null` | URL / data URL / Base64. Enables image-to-video. | | `cameraControl` | `null` | Fine camera motion — see [Camera control](#camera-control) below. | ### Cost | Model | 5 s | 10 s | |-------|-----|------| | `v1` / `v1.5` / `v1.6` Standard | **600** | **1 200** | | `v1` / `v1.5` / `v1.6` Professional | **1 050** | **2 100** | | `v2` | **1 200** | **2 400** | | `v2.5-turbo` | **600** | **1 200** | ### Camera control Available on all models. Provide a `cameraControl` object with a `config` sub-object containing any of these axes (all -10 to 10, default null = no control): | Axis | Effect | |------|--------| | `horizontal` | Translate left (−) / right (+) | | `vertical` | Translate down (−) / up (+) | | `pan` | Rotate left (−) / right (+) around Y axis | | `tilt` | Rotate down (−) / up (+) around X axis | | `roll` | Counter-clockwise (−) / clockwise (+) around Z axis | | `zoom` | Narrow FOV (−) / widen FOV (+) | ```json { "cameraControl": { "config": { "zoom": -3, "pan": 2 } } } ``` *** ## Kling V3 (`engine: "kling-v3"`) Kling V3 introduces a richer operation set via the `operation` discriminator. ### Operations | `operation` | Description | Key inputs | |-------------|-------------|------------| | `text-to-video` | Generate from a text prompt | `prompt` | | `image-to-video` | Animate a start frame (optionally to an end frame) | `sourceImage`, optionally `endImage` | | `reference-to-video` | Stylize video from reference images | `images[]` | | `video-to-video-edit` | Edit an existing video guided by a prompt | `videoUrl` | | `video-to-video-reference` | Reference an existing video's motion/structure | `videoUrl`, optionally `images[]` | ### Text-to-video ```json { "engine": "kling-v3", "operation": "text-to-video", "prompt": "A timelapse of a flower blooming in a sunlit meadow", "aspectRatio": "16:9", "duration": 5, "mode": "Standard" } ``` ### Image-to-video ```json { "engine": "kling-v3", "operation": "image-to-video", "prompt": "The cat stretches and yawns, then looks directly into the camera", "sourceImage": "https://image.civitai.com/.../photo.jpeg", "aspectRatio": "16:9", "duration": 5 } ``` Add `endImage` to interpolate between a start frame and an end frame: ```json { "engine": "kling-v3", "operation": "image-to-video", "prompt": "Smooth cinematic transition", "sourceImage": "https://.../start.jpeg", "endImage": "https://.../end.jpeg", "duration": 5 } ``` ::: warning Placeholder URLs The first-last-frame example uses `https://example.com/` placeholders. Replace them with publicly accessible image URLs before submitting. ::: ### Video-to-video Edit or reference the motion of an existing video: ```json { "engine": "kling-v3", "operation": "video-to-video-edit", "prompt": "Transform the scene into a vintage 1970s film aesthetic with grain", "videoUrl": "https://example.com/input.mp4", "duration": 5, "mode": "Standard" } ``` Use `video-to-video-reference` to guide generation from a video's motion without directly editing it. ### Multi-prompt (Kling V3) `multiPrompt` lets you sequence different prompts across a video timeline. Each entry has a `prompt` and a `duration` (seconds that prompt controls): ```json { "engine": "kling-v3", "operation": "text-to-video", "prompt": "Base scene description", "multiPrompt": [ { "prompt": "The camera slowly pushes in on the subject", "duration": 3 }, { "prompt": "The subject looks up and the scene brightens", "duration": 4 } ] } ``` ### Audio generation (Kling V3) Set `generateAudio: true` to produce a synchronized audio track. Optionally provide `voiceIds` to use a specific voice profile: ```json { "generateAudio": true, "voiceIds": ["voice_abc123"] } ``` For video-to-video operations, `keepAudio: true` (default) preserves the original video's audio. ### Parameters (Kling V3) | Field | Default | Notes | |-------|---------|-------| | `engine` | — ✅ | `"kling-v3"` | | `operation` | `"text-to-video"` | See operations table above. | | `prompt` | — ✅ | Generation prompt. | | `mode` | `"Standard"` | `"Standard"` or `"Professional"`. | | `duration` | `5` | 3–15 seconds (integer, unlike the original `kling` engine). | | `aspectRatio` | `"16:9"` | `"16:9"`, `"9:16"`, `"1:1"` | | `sourceImage` | `null` | Start frame for `image-to-video`. | | `endImage` | `null` | End frame for first-last-frame interpolation. | | `images[]` | `[]` | Reference images for `reference-to-video`. | | `videoUrl` | `null` | Source video for `video-to-video-*` operations. | | `generateAudio` | `false` | Generate a synchronized audio track. | | `voiceIds` | `null` | Voice profile IDs for audio generation. | | `keepAudio` | `true` | Preserve source video audio in video-to-video operations. | | `multiPrompt[]` | `null` | Time-sequenced prompts `{ prompt, duration }`. | ### Cost (Kling V3) Cost scales linearly with `duration`. All costs are in Buzz per second: | Operation group | Mode | Audio | Buzz/s | |-----------------|------|-------|--------| | t2v / i2v / ref | Standard | No | 219 | | t2v / i2v / ref | Standard | Yes | 292 | | t2v / i2v / ref | Professional | No | 292 | | t2v / i2v / ref | Professional | Yes | 364 | | v2v-edit / v2v-ref | Standard | — | 328 | | v2v-edit / v2v-ref | Professional | — | 437 | Examples at `duration: 5`: | Scenario | Buzz | |----------|------| | Standard t2v, no audio, 5 s | **~1 095** | | Standard t2v, with audio, 5 s | **~1 460** | | Professional t2v, no audio, 5 s | **~1 460** | | Professional t2v, with audio, 5 s | **~1 820** | | Standard video-to-video, 5 s | **~1 640** | | Professional video-to-video, 5 s | **~2 185** | *** ## Reading the result ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "videoGen", "status": "succeeded", "output": { "video": { "id": "blob_...", "url": "https://.../signed.mp4" } } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Long-running jobs Kling V3 Standard at 5 s typically completes in 2–5 minutes; Professional and longer durations take longer. Always use `wait=0` and handle via: * **Webhooks** (recommended): `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks) * **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 10 s → 30 s → 60 s cadence ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "duration must be one of" (kling) | Sent integer instead of string | The original `kling` engine uses string duration: `"5"` or `"10"`. | | `400` with "model is required" (kling) | Missing `model` on the original engine | `model` is required for `kling`; it is not used by `kling-v3`. | | `400` with "sourceImage is required" | Used `image-to-video` without an image | Provide `sourceImage` for `image-to-video`. | | `400` with "videoUrl is required" | Used `video-to-video-*` without a source video | Provide `videoUrl` for video-to-video operations. | | Step `failed`, `reason = "no_provider_available"` | No Kling worker available | Retry shortly. | | Output doesn't match end frame | `endImage` ignored for `text-to-video` | Use `operation: "image-to-video"` with both `sourceImage` and `endImage` to interpolate frames. | ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Results & webhooks](/orchestration/guide/results-and-webhooks) — production result handling * [WAN video generation](./wan) — comparable open-source alternative * [Veo 3 video generation](./veo3) — Google alternative for commercial-grade video --- --- url: /orchestration/recipes/ltx2.md --- # LTX2 video generation LTX2 is Lightricks' open video-generation model family. The orchestrator exposes both LTX2 and the newer LTX2.3 through the `videoGen` step, running on Civitai's ComfyUI workers. This recipe covers both versions end-to-end. ## Versions at a glance | `engine` | Models | Operations | Notes | |----------|--------|------------|-------| | `ltx2.3` | `22b-dev`, `22b-distilled` | `createVideo`, `firstLastFrameToVideo`, `editVideo`, `extendVideo`, `videoToVideo`, `audioToVideo` | Current release. Adds style transfer (`videoToVideo`) and audio-driven talking-head generation (`audioToVideo`). | | `ltx2` | `19b-dev`, `19b-distilled` | `createVideo`, `firstLastFrameToVideo`, `editVideo`, `extendVideo` | Previous release. Still supported. | **Default choice for new integrations**: `engine: "ltx2.3"`, `model: "22b-distilled"` for speed, `"22b-dev"` for maximum quality. ## The request shape Every LTX2 request is a single `videoGen` step on [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). Three keys select which LTX2 variant runs: ```json { "$type": "videoGen", "input": { "engine": "ltx2.3", // ltx2 | ltx2.3 "operation": "createVideo", // see table above "model": "22b-distilled" // version-specific } } ``` There's no `provider` discriminator — LTX2 currently only runs on Comfy. Each combination dispatches to its typed input schema (`ComfyLtx23CreateVideoInput`, `ComfyLtx2EditVideoInput`, …) so fields invalid for that combination get rejected with a `400`. ### Source-media inputs `editVideo`, `extendVideo`, `videoToVideo`, and `audioToVideo` accept `sourceVideo` / `sourceAudio` as either: * a Civitai AIR URN (`urn:air:…`), or * a civitai-hosted URL (`image.civitai.com`, orchestrator blob URLs, civitai-managed R2 / B2 / Spaces). Arbitrary third-party URLs (e.g. `raw.githubusercontent.com`, `cdn.jsdelivr.net`) are **not** fetched — requests that pass one are rejected with a `400`. Upload the media to Civitai first and pass the resulting URL. `images`, `firstFrame`, `lastFrame`, and `referenceImage` go through a separate image pipeline and *do* accept external URLs — only video/audio inputs have this restriction today. ## Operations All examples target production and use `` in place of your Bearer token. LTX2 jobs typically exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline) — submit with `wait=0` and handle completion via webhooks or polling. ### createVideo Single operation covers both **text-to-video** and **image-to-video** — add `images` to turn any text-to-video request into image-to-video. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?whatif=false&wait=0 Authorization: Bearer Content-Type: application/json { "steps": [{ "$type": "videoGen", "input": { "engine": "ltx2.3", "operation": "createVideo", "model": "22b-distilled", "prompt": "A beautiful sunset over the ocean with waves crashing", "duration": 5, "width": 1280, "height": 720, "fps": 24, "generateAudio": false, "guidanceScale": 4, "numInferenceSteps": 20 } }] } ``` Image-to-video: pass one or more images via `images`. ```json { "engine": "ltx2.3", "operation": "createVideo", "model": "22b-dev", "prompt": "The cat starts walking and exploring", "images": [ "https://image.civitai.com/.../42750475.jpeg" ], "duration": 5, "width": 1280, "height": 720, "fps": 24 } ``` ### firstLastFrameToVideo Interpolate between two keyframes (or extend from a single first frame). ```json { "engine": "ltx2.3", "operation": "firstLastFrameToVideo", "model": "22b-dev", "prompt": "smooth transition from morning to night", "firstFrame": "https://.../start.jpeg", "lastFrame": "https://.../end.jpeg", "frameGuideStrength": 0.8, "duration": 5, "width": 1280, "height": 720, "fps": 24 } ``` Omit `lastFrame` to seed the motion from just the first frame. ### editVideo Input video + prompt → transformed video. Uses Canny edge-maps for structural preservation. ```json { "engine": "ltx2.3", "operation": "editVideo", "model": "22b-dev", "prompt": "Transform the scene into a cyberpunk aesthetic with neon lighting", "sourceVideo": "https://.../input.mp4", "cannyLowThreshold": 0.4, "cannyHighThreshold": 0.8, "guideStrength": 0.7 } ``` ### extendVideo Continue an existing clip for `numFrames` more frames. ```json { "engine": "ltx2.3", "operation": "extendVideo", "model": "22b-dev", "prompt": "The scene continues with gentle camera push-in", "sourceVideo": "https://.../clip.mp4", "numFrames": 48, "fps": 24 } ``` ### videoToVideo *(LTX2.3 only)* Style-transfer an entire video. ```json { "engine": "ltx2.3", "operation": "videoToVideo", "model": "22b-dev", "prompt": "Rendered in the style of a watercolor painting", "sourceVideo": "https://.../clip.mp4" } ``` ### audioToVideo *(LTX2.3 only)* Audio-driven generation. With just `sourceAudio`, produces a matching visual scene; add `referenceImage` for talking-head / lip-sync output. ```json { "engine": "ltx2.3", "operation": "audioToVideo", "model": "22b-dev", "prompt": "A person speaks directly to camera with natural lip movements", "negativePrompt": "frozen lips, off-sync lips, blurry", "sourceAudio": "https://.../voiceover.mp3", "referenceImage": "https://.../portrait.jpeg", "audioToVideoAttentionScale": 2.0, "imageGuideStrength": 0.7, "duration": 5, "width": 1280, "height": 720, "fps": 24 } ``` ## Common parameters Shared across most (engine, operation) combinations. The per-variant schema in the [API reference](/orchestration/reference/) is authoritative. | Field | Typical values | Notes | |-------|----------------|-------| | `model` | `22b-dev` / `22b-distilled` (2.3); `19b-dev` / `19b-distilled` (2.0) | `-distilled` is faster with slightly lower fidelity; `-dev` is maximum quality. | | `width` / `height` | `1280×720`, `720×1280`, `1024×1024` | Vertical for phones: swap to `720×1280`. | | `duration` | `3` or `20` seconds | Only these two values are accepted; no intermediate durations. | | `fps` | `24`, `30` | Frame rate of the generated clip. | | `guidanceScale` | `3`–`7` | Prompt adherence. Higher = closer to prompt but less creative. | | `numInferenceSteps` | `8`–`50` | `20`–`40` is the practical quality sweet spot. More steps = higher quality, longer runtime. | | `generateAudio` | `true` / `false` | Emit a soundtrack alongside the video. | | `negativePrompt` | string | What you *don't* want. | | `seed` | integer | Reproducibility. | | `loras` | object | Attach community LoRAs to bias style or subject. Format: `{ "urn:air:lora:civitai:@": 0.8 }` — a dictionary keyed by AIR URN with the strength as the value. | ## Choosing a model | Need | Pick | |------|------| | Fastest turnaround, batch generation | `22b-distilled` (or `19b-distilled`) | | Highest fidelity, final-quality renders | `22b-dev` | | Parity with an older pipeline | `19b-dev` / `19b-distilled` | ## Reading the result Same as any `videoGen` step — a single `video` blob per clip: ```json { "status": "succeeded", "steps": [{ "name": "0", "$type": "videoGen", "status": "succeeded", "output": { "video": { "id": "blob_...", "url": "https://.../signed.mp4" } } }] } ``` Blob URLs are signed and expire — refetch the workflow or call [`GetBlob`](/orchestration/reference/operations/GetBlob) for a fresh URL. ## Long-running jobs LTX2.3 `22b-dev` at 1280×720 / 5 s typically runs 2–5 minutes; `editVideo` and `audioToVideo` can go longer. All of these exceed the [100-second request timeout](/orchestration/guide/getting-started#_3-poll-if-you-didn-t-wait-inline), so prefer `wait=0` and: * **Webhooks** (recommended): register a callback with `type: ["workflow:succeeded", "workflow:failed"]` — see [Results & webhooks](/orchestration/guide/results-and-webhooks) * **Polling**: `GET /v2/consumer/workflows/{workflowId}` on a 10 s → 30 s → 60 s cadence ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with unknown field | Field isn't valid for this `(engine, operation)` combo | Check the specific `ComfyLtxInput` schema via [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow). | | `400` "'sourceVideo' / 'sourceAudio' must be a Civitai AIR URN…" | Passed an external URL to `sourceVideo` or `sourceAudio` | Re-upload the media to Civitai and use the civitai-hosted URL, or pass a `urn:air:…` URN. See [Source-media inputs](#source-media-inputs). | | Step `failed`, `reason = "no_provider_available"` | No Comfy worker has the requested model warm | Retry shortly; or try the other model (`-dev` ↔ `-distilled`). | | Audio-to-video lip-sync poor | Attention scale too low, or audio clipping | Raise `audioToVideoAttentionScale` (e.g. `2.0` → `3.0`); re-encode source audio at constant bitrate. | | Edit-video loses structure | Canny guide too weak | Raise `guideStrength` (`0.7` → `0.85`) or widen the Canny thresholds. | ## Cost Billed in Buzz on the workflow's `transactions`. Use `whatif=true` for an exact preview; see [Payments (Buzz)](/orchestration/guide/submitting-work#payments-buzz) for currency selection. All LTX2 / LTX2.3 variants use the same formula — pixel volume × a per-pixel rate × a steps multiplier: ``` numFrames = duration × fps pixelVolumeInMP = (width × height × numFrames) / 1 000 000 stepsMultiplier = steps / 20 total = ceil(pixelVolumeInMP × 0.0008 × 1000 × 1.5 × stepsMultiplier) ``` | Shape | Buzz | |-------|------| | 720p (1280×720), 5 s @ 24 fps, `steps: 20` | **~133** | | 720p, 5 s @ 24 fps, `steps: 40` | ~266 | | 720p, 10 s @ 24 fps, `steps: 20` | ~266 | | 1080p (1920×1080), 5 s @ 24 fps, `steps: 20` | ~299 | `extendVideo` and `editVideo` scale by their total output frame count the same way. LTX2 is the cheapest video-gen path Civitai exposes — expect roughly linear cost growth with pixels × frames × steps. ## Related * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) — operation used by every example here * [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) — for polling * [Results & webhooks](/orchestration/guide/results-and-webhooks) — production-ready result handling * [WAN video generation](./wan) — comparable recipe for the WAN model family * Full parameter catalog: the `ComfyLtx23Input` and `ComfyLtx2Input` schemas in the [API reference](/orchestration/reference/) * [`videoGen` endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/videoGen/openapi.yaml) — standalone OpenAPI 3.1 YAML covering the full `videoGen` surface (WAN, LTX2, Flux, etc.); import into Postman / OpenAPI Generator --- --- url: /orchestration/recipes/training-ltx2.md --- # LTX2 video LoRA training Train a Lightricks LTX video LoRA on a small set of source video clips using AI Toolkit. The output LoRA is usable in [LTX2 video generation](./ltx2). | `ecosystem` | Base | Buzz / epoch | Notes | |-------------|------|--------------|-------| | `ltx2` | `Lightricks/LTX-2` (19B) | variable (formula-based) | Original LTX2. Cost is computed per-step from clip count + duration. | | `ltx23` | `Lightricks/LTX-2.3` (22B) | 200 (flat) | Newer LTX 2.3. Higher per-epoch cost reflects the heavier model — kept high deliberately to disincentivize very long runs. | The base checkpoint is fixed by `ecosystem`; there's no `model` field on the input. ::: tip Long-running step Video training is the slowest training mode on the platform. LTX 2.3 in particular is expensive — keep `epochs` ≤ 3 unless you have a clear reason. Always use `wait=0` and follow up via webhook or polling. ::: ## The request shape ```json { "$type": "training", "input": { "engine": "ai-toolkit", "ecosystem": "ltx2" // ltx2 | ltx23 } } ``` ## Prerequisites * A Civitai orchestration token ([Quick start → Prerequisites](/orchestration/guide/getting-started#prerequisites)) * A training-data zip containing source video clips * An accurate `count` of clips in the zip ## LTX2 Original 19B-parameter LTX video model. `resolution: 768` is the typical training resolution. ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training", "video"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "ltx2", "epochs": 2, "resolution": 768, "lr": 0.0002, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "https://civitai-delivery-worker-prod.5ac0637cfd0766c97916cefa3764fbdf.r2.cloudflarestorage.com/training-images/4470934/2725414TrainingData.nuB3.zip", "count": 4 }, "samples": { "prompts": ["a video of TOK", "TOK moving in a garden"] } } }] } ``` ## LTX 2.3 Newer 22B model. Same shape as LTX2; `lr` is typically lower and the per-epoch cost is materially higher (200 Buzz / epoch vs. ltx2's variable formula-based cost). ```http POST https://orchestration.civitai.com/v2/consumer/workflows?wait=0 Authorization: Bearer Content-Type: application/json { "tags": ["training", "video"], "steps": [{ "$type": "training", "priority": "normal", "retries": 2, "input": { "engine": "ai-toolkit", "ecosystem": "ltx23", "epochs": 2, "lr": 0.0001, "trainTextEncoder": false, "lrScheduler": "cosine", "optimizerType": "adamw8bit", "networkDim": 32, "networkAlpha": 32, "trainingData": { "type": "zip", "sourceUrl": "https://civitai-delivery-worker-prod.5ac0637cfd0766c97916cefa3764fbdf.r2.cloudflarestorage.com/training-images/4470934/2725414TrainingData.nuB3.zip", "count": 4 }, "samples": { "prompts": ["a video of TOK", "TOK moving in a garden"] } } }] } ``` ## Common parameters {#common-parameters} Defaults shown are the post-`ApplyDefaults` values for both LTX ecosystems. | Field | Required | Default | Notes | |-------|----------|---------|-------| | `engine` | ✅ | — | Always `ai-toolkit`. | | `ecosystem` | ✅ | — | `ltx2` or `ltx23`. | | `epochs` | | `5` | `1`–`20`. Billed per epoch. Keep low (2–3) for video. | | `numberOfRepeats` | | (no auto-default) | `1`–`5000`. | | `lr` | | `0.0001` | LTX2 examples often use `0.0002`; LTX 2.3 typically `0.0001`. | | `trainTextEncoder` | | `false` | Leave off — LTX text encoder is not retrained by AI Toolkit. | | `lrScheduler` | | `cosine` | `constant`, `constant_with_warmup`, `cosine`, `linear`, `step`. | | `optimizerType` | | `adamw8bit` | See SDXL/SD1 page for full enum. | | `networkDim` | | `32` | `1`–`256`. | | `networkAlpha` | | matches `networkDim` | `1`–`256`. | | `noiseOffset` | | `0` | `0`–`1`. | | `flipAugmentation` | | `false` | Random horizontal flips. | | `shuffleTokens` / `keepTokens` | | `false` / `0` | Caption-tag shuffling. | | `triggerWord` | | *(none)* | Activation token. | | `trainingData.{type, sourceUrl, count}` | ✅ | — | `type: "zip"`. Zip should contain video clips. | | `samples.prompts[]` | | `[]` | Per-epoch preview videos. | | `samples.negativePrompt` | | *(none)* | — | ## Reading the result Same envelope as the other training recipes — see [SDXL/SD1 → Reading the result](./training-sdxl-sd1#reading-the-result). Each epoch yields a video LoRA `.safetensors` blob plus any sample `.mp4` files. Use the trained LoRA in [LTX2 video generation](./ltx2) by referencing it in the workflow's `loras` field. ## Runtime Per-epoch wall time, default settings on a 4-clip dataset: | Ecosystem | Per-epoch | Typical full run | |-----------|-----------|-------------------| | `ltx2` | ~3–8 min | 6–16 min for 2 epochs | | `ltx23` | ~5–12 min | 10–25 min for 2 epochs | Always use `wait=0`. ## Cost LTX2 uses a formula-based cost (per-step area + clip count); LTX 2.3 is flat at 200 Buzz / epoch. ``` ltx2: total = epochs × computed_cost (formula varies with clip count + duration) ltx23: total = 200 × epochs ``` | Configuration | Buzz (training only) | |---------------|---------------------| | LTX2, `epochs: 2`, 4 clips | ~10–40 (depends on clip duration) + samples | | LTX 2.3, `epochs: 2` | 400 + samples | | LTX 2.3, `epochs: 5` | 1000 + samples | Sample-prompt rendering uses LTX2 video-generation rates and is billed separately. Run with `whatif=true` to see the exact pre-flight charge. ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `400` with "trainingData.sourceUrl not reachable" | Signed URL expired, or zip behind auth | Regenerate the URL. R2 signed URLs default to 24h. | | Step `failed` with VRAM-related error | Resolution × clip length too high | Lower `resolution` (e.g. to `512`), shorten clips. | | LTX 2.3 cost surprises you | Flat 200 Buzz / epoch, by design | Check `whatif=true` before submitting. Cap `epochs` at 2–3 unless you have budget. | | Trained LoRA produces no motion | Too few epochs / static reference clips | Raise `epochs`, ensure clips show the motion you want learned. | | Step `failed`, `moderationStatus: "Rejected"` | Dataset failed content moderation | Replace flagged clips. | ## Related * [Wan video LoRA training](./training-wan) — Wan video LoRA training (preview) * [LTX2 video generation](./ltx2) — use a trained LoRA in LTX2 inference * [Flux 2 Klein LoRA training](./training-flux2-klein) — image-side counterpart * [Results & webhooks](/orchestration/guide/results-and-webhooks) * [`SubmitWorkflow`](/orchestration/reference/operations/SubmitWorkflow) / [`GetWorkflow`](/orchestration/reference/operations/GetWorkflow) * [Endpoint OpenAPI spec](https://orchestration.civitai.com/v2/consumer/recipes/training/openapi.yaml) --- --- url: /orchestration/mcp.md --- # Civitai Orchestration MCP Server The orchestrator is also exposed as a remote [Model Context Protocol](https://modelcontextprotocol.io) server, so any MCP-aware client — Claude Desktop, claude.ai, Claude Code, Cursor, VS Code — can call it directly. The MCP server wraps the same workflow engine as the REST API, with one tool per recipe family (image, video, audio, music, training, analysis, utilities) plus tools for managing workflows. If you already use the orchestrator via [REST](/orchestration/guide/getting-started), MCP gives you the same capabilities packaged for LLM agents: tools an agent can call, prompts that guide multi-step pipelines, and a resource scheme for fetching generated media inline. ## Endpoint ``` https://orchestration.civitai.com/mcp ``` Transport is **Streamable HTTP** — the modern MCP HTTP transport used by remote MCP servers in Claude Desktop and claude.ai. There is no binary to install; clients connect directly over HTTPS. ## Authentication The MCP server uses the **same Civitai API key** as the [REST API](/orchestration/guide/authentication). Send it as a Bearer token in the `Authorization` header on every request: If you've set your Civitai token in the navbar (top-right), the snippets on this page are pre-filled with it — copy and paste into your MCP client config. Otherwise they show a `YOUR_CIVITAI_API_KEY` placeholder. Most tools (`generate_image`, `generate_video`, `transcribe_audio`, …) accept anonymous calls, but tools that read or list per-user state — most notably `list_workflows` — require a token. Authenticated calls are also tracked against your account for usage and Buzz accounting, so you'll generally want one configured. ## Connecting ### Claude Desktop Add the server to `~/.claude/config.json` (or use **Settings → Developer → Edit Config**), then restart Claude Desktop: The server appears in the MCP picker, and its tools become available in any conversation. ### claude.ai In claude.ai, add a custom remote MCP server under **Settings → Connectors → Add custom connector**: * **URL:** `https://orchestration.civitai.com/mcp` * **Authentication:** Custom header `Authorization` with the Bearer value below ### Claude Code / Cursor / VS Code MCP For Claude Code, run: For Cursor or VS Code with the MCP extension, add the same shape to your `mcp.json`: ### Generic HTTP MCP clients Any MCP client that speaks Streamable HTTP can connect — point it at `/mcp` and send the `Authorization` header. The server advertises full capabilities (`tools`, `prompts`, `resources`, all with `listChanged`) on `initialize`. ## What's available * **Generation tools** — image, video, audio (TTS / transcription), music * **Media utilities** — upscale, convert, frame extraction * **Analysis** — caption, rate (NSFW / safety), tag * **LLM access** — `chat_completion` against any OpenRouter model * **Discovery** — `find_models` natural-language search across the catalog * **Workflow management** — submit raw workflow JSON, get / cancel / list workflows * **Prompts** — three built-in pipeline guides for common tasks * **Resources** — `spine://blobs/{blobId}` for inline retrieval of generated media See the [tools reference](/orchestration/mcp/tools) for the full catalog. ## Related * [Authentication](/orchestration/guide/authentication) — how to get and rotate a Civitai API key * [Recipes](/orchestration/recipes/) — REST examples for the same workflows the MCP tools wrap * [Tools, prompts, and resources](/orchestration/mcp/tools) — full MCP catalog --- --- url: /site/reference/model-versions.md description: Fetch a specific version of a Civitai model by ID or file hash. --- # Model versions A **model version** is a single release within a model — one set of files, a specific `baseModel`, its own stats, and its own AIR identifier. Models may have many versions; call these endpoints when you need a specific one. ## Get a model version ``` GET /api/v1/model-versions/{id} ``` **Auth:** Mixed. A valid token exposes a few extra fields (e.g. early-access data for resources the caller has unlocked). ### Path parameters | Name | Type | Description | |------|------|-------------| | `id` | integer | Model version ID. | ### Response ```json { "id": 2514310, "modelId": 827184, "name": "v16.0", "description": null, "baseModel": "Illustrious", "baseModelType": "Standard", "air": "urn:air:sdxl:checkpoint:civitai:827184@2514310", "status": "Published", "availability": "Public", "nsfwLevel": 3, "createdAt": "2025-12-18T08:55:00.000Z", "updatedAt": "2025-12-18T09:16:12.062Z", "publishedAt": "2025-12-18T09:16:12.062Z", "uploadType": "Created", "usageControl": "Download", "trainedWords": [], "earlyAccessConfig": null, "earlyAccessEndsAt": null, "trainingStatus": null, "trainingDetails": null, "stats": { "downloadCount": 215627, "thumbsUpCount": 13828 }, "model": { "name": "WAI-illustrious-SDXL", "type": "Checkpoint", "nsfw": false, "poi": false }, "files": [ /* see below */ ], "images": [ /* preview images, filtered by browsing level */ ], "downloadUrl": "https://civitai.com/api/download/models/2514310" } ``` Each entry in `files[]`: ```json { "id": 2402203, "name": "waiIllustriousSDXL_v160.safetensors", "type": "Model", "sizeKB": 6775430.35, "metadata": { "format": "SafeTensor", "size": "pruned", "fp": "fp16" }, "pickleScanResult": "Success", "virusScanResult": "Success", "hashes": { "AutoV1": "4748A7F6", "AutoV2": "A5F58EB1C3", "SHA256": "A5F58EB1C33616...", "CRC32": "DAEE95B7", "BLAKE3": "1A411D9B...", "AutoV3": "22D8CB95B807" }, "downloadUrl": "https://civitai.com/api/download/models/2514310", "primary": true } ``` Returns `404` if the version doesn't exist or isn't published (moderators bypass the published check). ### Notes * The `air` field is the canonical [AIR identifier](../guide/air). Forward it directly to the Orchestration API when you need to reference this resource in a workflow. * `images[]` respects the caller's browsing level — SFW-gated callers never see mature previews. On Civitai's "green" domain or from restricted regions, images are filtered to SFW regardless of session. * `files[]` only contains public files. Private / archived files are omitted. * `model.mode` appears as `Archived` or `TakenDown` when the parent model has been moderated. When archived, `files[]` and `downloadUrl` are dropped; when taken down, `images[]` is dropped as well. The field is omitted entirely on healthy models. * `stats` has only `downloadCount` and `thumbsUpCount` here — model-version-level metrics. Use [`GET /models/{id}`](./models#get-a-model) if you need the full set including comments and tipping. ### Example ```bash curl "https://civitai.com/api/v1/model-versions/2514310" | jq '{id, name, air, downloadUrl}' ``` ## Get a model version by file hash ``` GET /api/v1/model-versions/by-hash/{hash} ``` **Auth:** Public. Useful when you have a local file and want to identify the model without downloading anything from Civitai. Accepts any of the hash types Civitai records: `AutoV1`, `AutoV2`, `AutoV3`, `SHA256`, `BLAKE3`, or `CRC32`. The hash is matched case-insensitively. ### Path parameters | Name | Type | Description | |------|------|-------------| | `hash` | string | File hash. | ### Response Same shape as `GET /model-versions/{id}`. Returns `404` if no matching file is found, or the file belongs to an unpublished version. ### Example ```bash # Identify a local .safetensors by its SHA256 sha256sum model.safetensors # a5f58eb1c33616c4f06bca55af39876a7b817913cd829caa8acb111b770c85cc curl "https://civitai.com/api/v1/model-versions/by-hash/A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC" \ | jq '{id, modelId, name, air}' ``` ## Bulk lookup by hash ``` POST /api/v1/model-versions/by-hash ``` **Auth:** Public. Same as `GET /by-hash/{hash}`, but takes up to **100** SHA256 hashes in a single request. Useful when scanning a directory of local files. Hashes shorter or longer than 64 characters are rejected (`400`); each must be the full SHA256. ### Request body ```json [ "A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC", "B7C9D1F2A3E4B5C6D7E8F9A0B1C2D3E4F5A6B7C8D9E0F1A2B3C4D5E6F7A8B9C0" ] ``` ### Response An array of model version objects, same shape as `GET /model-versions/{id}`. Hashes that don't match any file are silently dropped — the response can have fewer entries than the request. ```json [ { "id": 2514310, "modelId": 827184, "name": "v16.0", "...": "..." } ] ``` ### Errors | Status | Cause | |--------|-------| | `400` | Missing body, non-array, hash not 64 chars, or more than 100 entries. The error message lists the first parse failure. | ### Example ```bash curl -X POST -H "Content-Type: application/json" \ -d '["A5F58EB1...","B7C9D1F2..."]' \ "https://civitai.com/api/v1/model-versions/by-hash" ``` ::: tip If you only need the IDs (e.g. to feed back into the Orchestration API or to de-duplicate a download list), use the lighter [`/by-hash/ids`](#bulk-lookup-hash-id) endpoint below — it returns just `{modelVersionId, hash}` pairs and is cheaper. ::: ## Bulk lookup hash → ID {#bulk-lookup-hash-id} ``` POST /api/v1/model-versions/by-hash/ids ``` **Auth:** Public. Resolves SHA256 hashes to model version IDs only. Accepts up to **10,000** hashes per call. Use this when you don't need the full version object — e.g. to dedupe a download list or to map local files back to Civitai IDs in bulk. ### Request body ```json [ "A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC", "B7C9D1F2A3E4B5C6D7E8F9A0B1C2D3E4F5A6B7C8D9E0F1A2B3C4D5E6F7A8B9C0" ] ``` ### Response ```json [ { "modelVersionId": 2514310, "hash": "A5F58EB1C33616C4F06BCA55AF39876A7B817913CD829CAA8ACB111B770C85CC" } ] ``` Unmatched hashes are silently dropped. ### Example ```bash # Map a manifest of local files to model version IDs jq -r '.files[].sha256' manifest.json \ | jq -R . | jq -s . \ | curl -X POST -H "Content-Type: application/json" -d @- \ "https://civitai.com/api/v1/model-versions/by-hash/ids" ``` ## Get a minimal model version ``` GET /api/v1/model-versions/mini/{id} ``` **Auth:** Mixed. A trimmed-down version of `GET /model-versions/{id}`, intended for clients that need the bare minimum to **download a file** or **identify whether the caller can generate** with it. Skips heavy fields like `images[]`, `description`, and the full `files[]` array. ### Path parameters | Name | Type | Description | |------|------|-------------| | `id` | integer | Model version ID. | ### Query parameters | Name | Type | Description | |------|------|-------------| | `epoch` | integer | For `Private` training-result versions, request a specific epoch's file. Falls back to the last epoch if omitted. | ### Response ```json { "air": "urn:air:sdxl:checkpoint:civitai:827184@2514310", "versionName": "v16.0", "modelName": "WAI-illustrious-SDXL", "baseModel": "Illustrious", "availability": "Public", "publishedAt": "2025-12-18T09:16:12.062Z", "size": 6775430.35, "fileType": "Model", "fileName": "waiIllustriousSDXL_v160.safetensors", "hashes": { "AutoV1": "4748A7F6", "AutoV2": "A5F58EB1C3", "SHA256": "A5F58EB1C33616...", "CRC32": "DAEE95B7", "BLAKE3": "1A411D9B...", "AutoV3": "22D8CB95B807" }, "downloadUrls": ["https://civitai.com/api/download/models/2514310"], "format": "SafeTensor", "canGenerate": true, "isFeatured": false, "requireAuth": false, "checkPermission": false, "earlyAccessEndsAt": null, "freeTrialLimit": null, "additionalResourceCharge": false, "minor": false, "sfwOnly": false } ``` ### Field notes | Field | Description | |-------|-------------| | `canGenerate` | `true` when the resource can be used in an Orchestration workflow for the calling user. Combines coverage, availability, and permission checks. | | `checkPermission` | `true` when the resource is gated (early-access window active, or `Private`). Pair with [`/permissions/check`](./permissions) for an explicit yes/no. | | `requireAuth` | When `true`, the `downloadUrls` require a token (passed as `Authorization: Bearer` or `?token=`). | | `earlyAccessEndsAt` | Only present when `checkPermission` is `true`. ISO timestamp when the early-access window ends. | | `freeTrialLimit` | Number of free generations allowed during early access, when configured. | | `additionalResourceCharge` | `true` when generating with this resource costs extra Buzz beyond the base workflow cost. | Returns `404` if the version doesn't exist, isn't published, the primary file is missing, or (for private training results) the requested `epoch` isn't found. ### Example ```bash # Just the download URL and SHA256, fast curl "https://civitai.com/api/v1/model-versions/mini/2514310" \ | jq '{air, downloadUrls, "sha256": .hashes.SHA256}' ``` --- --- url: /site/reference/models.md description: List and fetch Civitai models. --- # Models A **model** represents a trained AI resource published on Civitai — a checkpoint, LoRA, textual inversion, VAE, ControlNet, upscaler, etc. Each model has one or more [model versions](./model-versions) containing the actual files. ## List models ``` GET /api/v1/models ``` **Auth:** Mixed — the `favorites` and `hidden` params require a bearer token. ### Query parameters | Name | Type | Default | Description | |------|------|---------|-------------| | `limit` | integer (1–100) | 100 | Number of items per page. | | `page` | integer (≥ 1) | — | 1-indexed page number. Incompatible with `query`. | | `cursor` | string | — | Opaque pagination cursor. Use `metadata.nextCursor` from the previous response. | | `query` | string | — | Full-text search (Meilisearch). Requires cursor-based pagination. | | `ids` | comma-separated integers | — | Restrict to specific model IDs. | | `tag` | string | — | Filter by tag name. | | `username` | string | — | Filter by creator. Auto-slugified. | | `types` | `ModelType` or `ModelType[]` | — | One or more of the values from `GET /enums` (`ModelType`). Repeat the param or comma-separate. | | `baseModels` | string or string\[] | — | Filter by base model (e.g. `SDXL 1.0`, `Flux.1 D`). See `GET /enums` (`BaseModel`). | | `checkpointType` | `Standard` | `Trained` | `Merge` | — | For checkpoint models only. | | `sort` | `Highest Rated` | `Most Downloaded` | `Newest` | ... | `Highest Rated` | See source for full list. | | `period` | `AllTime` | `Year` | `Month` | `Week` | `Day` | `AllTime` | Time window for sort metrics. | | `nsfw` | boolean | `false` | If `true`, include mature content. Ignored on SFW-gated regions. | | `supportsGeneration` | boolean | — | Only return models supported by on-site generation. | | `fromPlatform` | boolean | — | Only return models trained on Civitai. | | `earlyAccess` | boolean | — | Include early-access versions. | | `primaryFileOnly` | boolean | `false` | Drop non-primary files from each version's `files[]`. | | `favorites` | boolean | `false` | *(auth required)* Only models in the caller's bookmark collection. | | `hidden` | boolean | `false` | *(auth required)* Only models the caller has hidden. | Unknown params are silently ignored after Zod parsing; invalid ones return `400`. ### Response ```json { "items": [ { "id": 827184, "name": "WAI-illustrious-SDXL", "description": "

...

", "type": "Checkpoint", "nsfw": false, "nsfwLevel": 31, "availability": "Public", "supportsGeneration": true, "allowNoCredit": true, "allowCommercialUse": "{Image,RentCivit}", "allowDerivatives": true, "allowDifferentLicense": true, "minor": false, "poi": false, "sfwOnly": false, "mode": null, "stats": { "downloadCount": 1272529, "thumbsUpCount": 79272, "thumbsDownCount": 202, "commentCount": 1931, "tippedAmountCount": 156742 }, "creator": { "username": "WAI0731", "image": "https://image.civitai.com/.../WAI0731.jpeg" }, "tags": ["base model", "anime"], "modelVersions": [ { "id": 2514310, "name": "v16.0", "baseModel": "Illustrious", "baseModelType": "Standard", "publishedAt": "2025-12-18T09:16:12.062Z", "supportsGeneration": true, "stats": { "downloadCount": 215627, "thumbsUpCount": 13828, "thumbsDownCount": 22 }, "files": [ { "id": 2402203, "name": "waiIllustriousSDXL_v160.safetensors", "type": "Model", "sizeKB": 6775430.35, "hashes": { "AutoV2": "A5F58EB1C3", "SHA256": "A5F58EB1C3...", "BLAKE3": "1A411D9B..." }, "downloadUrl": "https://civitai.com/api/download/models/2514310", "primary": true, "metadata": { "format": "SafeTensor", "size": "pruned", "fp": "fp16" } } ], "images": [], "downloadUrl": "https://civitai.com/api/download/models/2514310" } ] } ], "metadata": { "nextCursor": "75363|932023|257749", "nextPage": "https://civitai.com/api/v1/models?limit=100&cursor=...", "currentPage": 1, "pageSize": 100 } } ``` When using `page` pagination, `metadata` additionally includes `currentPage` and `pageSize`. When using `cursor` pagination, those are omitted. ### Notes * `page * limit` above 1000 returns `429`; use `cursor` for deep paging. See [Pagination](../guide/pagination). * Including `query` without `cursor` is fine; combining `query` with `page` returns `400`. * Only `Published` versions are returned to non-moderator callers. Files marked non-public by the uploader are hidden from `files[]`. * `mode` is non-null when the parent model has been moderated. Values: `Archived` (drops `files[]` and `downloadUrl`) and `TakenDown` (also drops `images[]`). Omitted entirely on healthy models. ### Example ```bash curl "https://civitai.com/api/v1/models?limit=5&types=LORA&baseModels=SDXL%201.0&sort=Most%20Downloaded" ``` ## Get a model ``` GET /api/v1/models/{id} ``` **Auth:** Public. ### Path parameters | Name | Type | Description | |------|------|-------------| | `id` | integer | Model ID. | ### Response Returns the same shape as a single item from the list endpoint — same top-level keys (`id`, `name`, `type`, `modelVersions`, `creator`, `tags`, `stats`, ...). Returns `404` if the model doesn't exist: ```json { "error": "No model with id 0" } ``` ### Example ```bash curl "https://civitai.com/api/v1/models/827184" ``` --- --- url: /orchestration/recipes/multi-speaker-dialogue.md --- # Multi-speaker dialogue The `audioMix` step overlays multiple audio clips on a single timeline, each placed at its own start offset with optional per-track volume and fades. Pair it with N `textToSpeech` steps to produce multi-speaker dialogue, debate, or audio drama — including overlap, interruption, and cross-talk that single-utterance TTS can't model on its own. ::: tip Why not a multi-speaker TTS engine? Qwen3 TTS synthesises one continuous utterance per request with no silence-injection or speaker switching. Asking the model to "say A, then pause, then say B" produces unpredictable prosody. Generating each line as its own short, clean TTS step and overlaying them with `audioMix` keeps every utterance natural while letting you place them anywhere on the output timeline — including overlapping intervals, which is the only way to get genuine cross-talk. ::: ## How it composes Every dialogue workflow has the same shape: 1. **One `textToSpeech` step per spoken line** — each step picks its own speaker (built-in voice, voice clone, or voice design) and produces a short clean clip. 2. **One trailing `audioMix` step** referencing each TTS output via `$ref`. By default, tracks play back-to-back in array order — no timing math required. ```json { "steps": [ { "$type": "textToSpeech", "name": "alice", "input": { /* ... */ } }, { "$type": "textToSpeech", "name": "bob", "input": { /* ... */ } }, { "$type": "audioMix", "input": { "tracks": [ { "url": { "$ref": "alice", "path": "output.audioBlob.url" } }, { "url": { "$ref": "bob", "path": "output.audioBlob.url" } } ] } } ] } ``` ### Timeline rules Each track resolves a start time on the output timeline using these rules: * **Implicit (default)**: track *i* starts when track *i-1* ends (in array order). No fields needed. * **`offset`**: float in seconds, nudges the implicit position. Negative = overlap/interrupt; positive = gap. `offset: -0.5` means "start 500 ms before the previous track ends". * **`startSeconds`**: absolute timeline anchor. When set, the track plays at exactly this time **and is excluded from the implicit chain** — perfect for music beds. Other tracks in the array compute their implicit position as if anchored tracks weren't there. If both `startSeconds` and `offset` are set on the same track, `startSeconds` wins. The output is a single Ogg Vorbis blob plus a `tracks[]` array echoing each input's resolved `startSeconds` and probed `duration` — convenient for rendering subtitles or speaker highlights without re-probing. ## Sequential reading Three speakers, each clip placed after the previous one ends. No overlap; the gaps between clips are silent. ## Crosstalk and interruption A speaker starts before the previous one finishes. ffmpeg's `amix` sums the overlapping samples, so the two voices are audible simultaneously. Small `fadeInMs` on the interrupter softens the entry. For a "hot debate" effect, set `offset: -0.3` to `-1.0` on each interrupter — negative offsets pull the track earlier on the timeline. Use mild attenuation (`volumeDb: -1` to `-3`) on whichever speaker should sit slightly back in the mix. ## Adding a music or ambience bed The `url` field also accepts a direct URL string — no `$ref` needed — so you can drop in static background music or ambience under a voice track. The bed sits at `volumeDb: -18` (well under speech), fades in over 500 ms, and fades out over 1.5 s. Keep beds at -15 dB or lower against speech. ## Input fields ### `audioMix` step | Field | Required | Default | Notes | |-------|----------|---------|-------| | `tracks` | ✅ | — | Array of tracks to overlay. At least one. | | `normalize` | | `false` | When `true`, ffmpeg's `amix` divides by N to avoid clipping. Keep `false` when you've set per-track `volumeDb` and want the levels you specified. | | `maxDurationSeconds` | | `600` | Server-side cap on output length. The job fails early if the union of track intervals exceeds this. | ### Per-track fields | Field | Required | Default | Notes | |-------|----------|---------|-------| | `url` | ✅ | — | Either a direct `"https://..."` URL string, or a `{ "$ref": "", "path": "output.audioBlob.url" }` referencing a prior step's output. | | `startSeconds` | | implicit | Absolute timeline anchor. Set this to pin a track to a fixed time (music bed, ambience). When set, the track is taken out of the implicit-sequencing chain. When unset, the track plays after the previous non-anchored track ends. | | `offset` | | `0` | Seconds to nudge this track from its implicit position. Negative = overlap/interrupt; positive = gap. Ignored when `startSeconds` is set. | | `volumeDb` | | `0` | Per-track gain in dB. `-3` halves perceived loudness; `-18` is a typical music-bed level. | | `fadeInMs` | | `0` | Linear fade-in length applied at the track's resolved start. | | `fadeOutMs` | | `0` | Linear fade-out applied at the track's tail (resolved start + duration − fadeOutMs). | ## Reading the result ```json { "status": "succeeded", "steps": [ { "name": "opener", "$type": "textToSpeech", "output": { "audioBlob": { /* ... */ } } }, { "name": "pro", "$type": "textToSpeech", "output": { "audioBlob": { /* ... */ } } }, { "name": "con", "$type": "textToSpeech", "output": { "audioBlob": { /* ... */ } } }, { "name": "3", "$type": "audioMix", "status": "succeeded", "output": { "audioBlob": { "id": "ZXNS7C...ogg", "url": "https://orchestration-new.civitai.com/v2/consumer/blobs/ZXNS7C...ogg?sig=...", "duration": 18.2 }, "tracks": [ { "startSeconds": 0.0, "duration": 5.7 }, { "startSeconds": 5.7, "duration": 5.9 }, { "startSeconds": 11.6, "duration": 6.2 } ] } } ] } ``` * **`audioBlob.url`** — signed URL for the mixed Ogg Vorbis output. Stream it directly in an `