Recipes
Task-oriented, end-to-end examples. Each recipe walks through a real workflow: what to send, what you get back, common parameter tweaks, and troubleshooting.
Video
- WAN video generation — all WAN versions (2.1–2.7) across FAL, Comfy, and Civitai, with text-to-video, image-to-video, reference-to-video, and edit-video operations
- LTX2 video generation — Lightricks LTX2 and LTX2.3 on Comfy, including the new videoToVideo (style transfer) and audioToVideo (talking-head) operations
- Kling video generation — Kuaishou Kling (v1/v1.5/v1.6/v2/v2.5-turbo with camera control) and Kling V3 (5 operations, multi-prompt, audio, video-to-video)
- Vidu video generation — Vidu 2.0 (flat 600 Buzz, anime style, first-last-frame) and Vidu Q3 (per-second pricing, 4 resolution tiers, turbo mode, native audio)
- Veo 3 video generation — Google Veo 3.0/3.1 in standard / fast / lite tiers; operation inferred from image count; optional synchronized audio track
- Grok video generation — xAI Grok-Imagine-Video via FAL; text-to-video, image-to-video, and edit-video with 480p/720p output
- HunyuanVideo generation — Tencent HunyuanVideo on Comfy workers; text-to-video with LoRA support; compute-intensive, always use
wait=0 - Video upscaling — FlashVSR, 2–4× with a 2560 px output cap
- Video frame interpolation — VFIMamba, 2× or 3× frame-count, smooths generated or low-FPS footage
- Compose media (video) — overlay/stack videos on a canvas, place audio over a clip, picture-in-picture; the video form of
composeMedia
Image
- Flux 2 image generation — Flux.2 Klein (default, cheap + capable, 4b/9b, supports createVariant) plus Dev / Flex / Pro / Max for higher-fidelity and commercial tiers
- Flux 1 image generation — Flux.1 through sdcpp (default, minimal required input) or Comfy, plus the BFL-hosted
flux1-kontextediting tier - Z-Image generation — lightweight text-to-image on sdcpp;
turbo(default, distilled, extremely fast + cheap) orbasewhen you need more fidelity - Qwen image generation — Qwen-Image 20B on sdcpp (default) or FAL-hosted Qwen2 with a Pro tier; supports createImage + createVariant + editImage
- MAI Image 2.5 image generation — Microsoft MAI Image 2.5 via FAL; text-to-image only, eleven aspect ratios (incl.
auto), flat per-image pricing - Anima image generation — anime-tuned sdcpp ecosystem with built-in diffuser, LoRA support, createImage only
- ERNIE image generation — Baidu ERNIE Image on Comfy;
erniestandard +turbodistilled variant, built-in diffuser, LoRA support, createImage only - SDXL image generation — Stable Diffusion XL at 1024² native via sdcpp (default) or Comfy, with createImage + createVariant
- SD1 image generation — classic Stable Diffusion 1.5 at 512² via sdcpp (default) or Comfy, with createImage + createVariant
- OpenAI image generation — GPT-Image 1 / 1.5 and DALL·E 2 / 3 via OpenAI's hosted API
- Google image generation — Imagen 4 and Nano Banana Pro / 2 via Vertex AI, with editing + web-search grounding
- Gemini image generation — Gemini 2.5 Flash Image (same product as Nano Banana) via the direct Gemini API
- Seedream image generation — ByteDance Seedream v3 / v4 / v4.5 / v5.0-lite with native up-to-4096 output + editing
- Grok image generation — xAI Grok with wide aspect-ratio menu (21 options) + editing
- WAN image generation — WAN v2.2 / v2.2-5b / v2.5 / v2.7 via FAL (image counterpart to the WAN video recipe)
- Image upscaling — ESRGAN-family upscalers, chain after
imageGenor use standalone
Audio
- Transcription — Qwen3-ASR, multilingual, word-level timestamps for captioning
- Text-to-speech — built-in speakers with optional style prompt, or voice cloning from a reference clip
- Multi-speaker dialogue — overlay TTS clips on a shared timeline for debate, interview, or audio-drama scenes; the audio form of
composeMedia - ACE-Step music generation — full songs from a style description + structured lyrics, 2B turbo default with optional 4B XL overrides; audio-only MP3 or MP4 with a still cover image
Language models
- Chat completion — any OpenRouter model or Civitai AIR model, vision inputs, tool use, streaming, image generation via
modalities: ["image"]; OpenAI-compatible/v1/chat/completionsendpoint or workflow step
Utilities
- Prompt enhancement — LLM rewrites a user prompt for a target ecosystem (Flux / SDXL / SD1 / LTX2), returns issues + recommendations + enhanced prompt
- Image conversion — format conversion (JPEG / PNG / WebP / GIF), resize, and region blur; flat 1 Buzz
Training
Train a LoRA on your own dataset using AI Toolkit. You control training length with steps, the number of saved checkpoints with epochs, and can resume from an existing LoRA with continueFrom. All training runs are async — submit with wait=0 and follow up via polling or a webhook. Cost is steps × costPerStep + epochs × a per-epoch surcharge with an 80%-of-default floor (rates vary per ecosystem — see each page); use whatif=true to preview the exact charge.
- SDXL & SD1 LoRA training — classic Stable Diffusion ecosystems (500 Buzz default); cheapest pick for first fine-tunes
- Flux 1 LoRA training — Flux.1 Dev or Schnell (2000 Buzz default); higher quality, fixed BFL base checkpoints
- Flux 2 Klein LoRA training — Flux 2 Klein 4b / 9b (500 / 1000 Buzz default), including image-edit training mode with control reference images
- Wan video LoRA training — preview ecosystem for Wan 2.1 / 2.2 video LoRAs (3000 Buzz default)
- LTX2 video LoRA training — Lightricks LTX2 and LTX 2.3 video LoRAs (2750 Buzz default, 3000-step runs)
- Chroma / ERNIE / Qwen / Z-Image LoRA training — five smaller image ecosystems consolidated into one page; each section is independently runnable
Copy-paste runnable
All recipes target https://orchestration.civitai.com and use <your-token> as a placeholder for your Bearer token. Drop them into curl, HTTPie, VS Code's REST Client, or any tool that speaks HTTP.