Skip to content

Recipes

Task-oriented, end-to-end examples. Each recipe walks through a real workflow: what to send, what you get back, common parameter tweaks, and troubleshooting.

Video

  • WAN video generation — all WAN versions (2.1–2.7) across FAL, Comfy, and Civitai, with text-to-video, image-to-video, reference-to-video, and edit-video operations
  • LTX2 video generation — Lightricks LTX2 and LTX2.3 on Comfy, including the new videoToVideo (style transfer) and audioToVideo (talking-head) operations
  • Kling video generation — Kuaishou Kling (v1/v1.5/v1.6/v2/v2.5-turbo with camera control) and Kling V3 (5 operations, multi-prompt, audio, video-to-video)
  • Vidu video generation — Vidu 2.0 (flat 600 Buzz, anime style, first-last-frame) and Vidu Q3 (per-second pricing, 4 resolution tiers, turbo mode, native audio)
  • Veo 3 video generation — Google Veo 3.0/3.1 in standard / fast / lite tiers; operation inferred from image count; optional synchronized audio track
  • Grok video generation — xAI Grok-Imagine-Video via FAL; text-to-video, image-to-video, and edit-video with 480p/720p output
  • HunyuanVideo generation — Tencent HunyuanVideo on Comfy workers; text-to-video with LoRA support; compute-intensive, always use wait=0
  • Video upscaling — FlashVSR, 2–4× with a 2560 px output cap
  • Video frame interpolation — VFIMamba, 2× or 3× frame-count, smooths generated or low-FPS footage
  • Compose media (video) — overlay/stack videos on a canvas, place audio over a clip, picture-in-picture; the video form of composeMedia

Image

  • Flux 2 image generation — Flux.2 Klein (default, cheap + capable, 4b/9b, supports createVariant) plus Dev / Flex / Pro / Max for higher-fidelity and commercial tiers
  • Flux 1 image generation — Flux.1 through sdcpp (default, minimal required input) or Comfy, plus the BFL-hosted flux1-kontext editing tier
  • Z-Image generation — lightweight text-to-image on sdcpp; turbo (default, distilled, extremely fast + cheap) or base when you need more fidelity
  • Qwen image generation — Qwen-Image 20B on sdcpp (default) or FAL-hosted Qwen2 with a Pro tier; supports createImage + createVariant + editImage
  • MAI Image 2.5 image generation — Microsoft MAI Image 2.5 via FAL; text-to-image only, eleven aspect ratios (incl. auto), flat per-image pricing
  • Anima image generation — anime-tuned sdcpp ecosystem with built-in diffuser, LoRA support, createImage only
  • ERNIE image generation — Baidu ERNIE Image on Comfy; ernie standard + turbo distilled variant, built-in diffuser, LoRA support, createImage only
  • SDXL image generation — Stable Diffusion XL at 1024² native via sdcpp (default) or Comfy, with createImage + createVariant
  • SD1 image generation — classic Stable Diffusion 1.5 at 512² via sdcpp (default) or Comfy, with createImage + createVariant
  • OpenAI image generation — GPT-Image 1 / 1.5 and DALL·E 2 / 3 via OpenAI's hosted API
  • Google image generation — Imagen 4 and Nano Banana Pro / 2 via Vertex AI, with editing + web-search grounding
  • Gemini image generation — Gemini 2.5 Flash Image (same product as Nano Banana) via the direct Gemini API
  • Seedream image generation — ByteDance Seedream v3 / v4 / v4.5 / v5.0-lite with native up-to-4096 output + editing
  • Grok image generation — xAI Grok with wide aspect-ratio menu (21 options) + editing
  • WAN image generation — WAN v2.2 / v2.2-5b / v2.5 / v2.7 via FAL (image counterpart to the WAN video recipe)
  • Image upscaling — ESRGAN-family upscalers, chain after imageGen or use standalone

Audio

  • Transcription — Qwen3-ASR, multilingual, word-level timestamps for captioning
  • Text-to-speech — built-in speakers with optional style prompt, or voice cloning from a reference clip
  • Multi-speaker dialogue — overlay TTS clips on a shared timeline for debate, interview, or audio-drama scenes; the audio form of composeMedia
  • ACE-Step music generation — full songs from a style description + structured lyrics, 2B turbo default with optional 4B XL overrides; audio-only MP3 or MP4 with a still cover image

Language models

  • Chat completion — any OpenRouter model or Civitai AIR model, vision inputs, tool use, streaming, image generation via modalities: ["image"]; OpenAI-compatible /v1/chat/completions endpoint or workflow step

Utilities

  • Prompt enhancement — LLM rewrites a user prompt for a target ecosystem (Flux / SDXL / SD1 / LTX2), returns issues + recommendations + enhanced prompt
  • Image conversion — format conversion (JPEG / PNG / WebP / GIF), resize, and region blur; flat 1 Buzz

Training

Train a LoRA on your own dataset using AI Toolkit. You control training length with steps, the number of saved checkpoints with epochs, and can resume from an existing LoRA with continueFrom. All training runs are async — submit with wait=0 and follow up via polling or a webhook. Cost is steps × costPerStep + epochs × a per-epoch surcharge with an 80%-of-default floor (rates vary per ecosystem — see each page); use whatif=true to preview the exact charge.

Copy-paste runnable

All recipes target https://orchestration.civitai.com and use <your-token> as a placeholder for your Bearer token. Drop them into curl, HTTPie, VS Code's REST Client, or any tool that speaks HTTP.

Civitai Developer Documentation