OpenAI image generation
The orchestrator routes OpenAI image requests to OpenAI's hosted APIs via the imageGen step. Five models, each with its own behaviour and quality tier:
model | Operations | Notes |
|---|---|---|
gpt-image-2 | createImage / editImage | Default — latest GPT-Image model. Arbitrary width/height (not fixed presets), optional maskImage for regional edits. No background control. |
gpt-image-1.5 | createImage / editImage | Previous flagship. Fixed size enum, 4 images max, quality + background controls. |
gpt-image-1 | createImage / editImage | Older GPT-Image. Up to 10 images per call. Supports background: "transparent". |
dall-e-3 | createImage only | Stand-alone natural vs vivid style control, standard / hd quality, up to 1792 px. |
dall-e-2 | createImage / editImage | Legacy. Only supports square outputs (256² / 512² / 1024²) and 1000-char prompts. Use only for compatibility reasons. |
Default choice for new integrations: model: "gpt-image-2". It's OpenAI's latest flagship with flexible output dimensions and mask-based editing. Fall back to gpt-image-1.5 if you need background control or prefer the fixed size enum; gpt-image-1 for transparent backgrounds or quantity > 4; dall-e-3 for style-controlled vivid output; avoid dall-e-2 unless you specifically need it.
Prerequisites
- A Civitai orchestration token (Quick start → Prerequisites)
- For
editImage: one or more source image URLs, data URLs, or Base64 strings - For
gpt-image-2mask-based edits: a mask image whose fully transparent pixels (alpha = 0) indicate the region to edit
gpt-image-2 (default)
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "gpt-image-2",
"operation": "createImage",
"prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
"width": 1024,
"height": 1024,
"quantity": 1,
"quality": "high"
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
Parameters
| Field | Default | Allowed | Notes |
|---|---|---|---|
prompt | — ✅ | ≤ 32 000 chars | Natural-language works best. |
width | 1024 | 256–3840, multiple of 16 | Explicit width in pixels. |
height | 1024 | 256–3840, multiple of 16 | Explicit height in pixels. |
quantity | 1 | 1–4 | |
quality | high | low / medium / high | Drives pricing (see Cost below). |
outputFormat | jpeg | jpeg / png / webp | Inherited from the imageGen step. |
Editing (editImage)
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "gpt-image-2",
"operation": "editImage",
"prompt": "Make it a winter scene with snow falling",
"images": [
"https://image.civitai.com/.../source.jpeg"
]
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
In edit mode width / height are optional. When both are omitted, the output size is inferred from the input images (image_size: "auto" is sent to the model). To force explicit output dimensions, set both fields.
To restrict the edit to a specific region, pass maskImage — a URL, data URL, or Base64 string where fully transparent pixels (alpha = 0) mark the area to modify:
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "gpt-image-2",
"operation": "editImage",
"prompt": "Replace the background with a tropical beach",
"images": ["https://image.civitai.com/.../source.jpeg"],
"maskImage": "https://image.civitai.com/.../mask.png"
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
Only the first image in images[] is masked; additional reference images are ignored by the mask.
gpt-image-1.5
POST https://orchestration.civitai.com/v2/consumer/workflows?wait=60
Authorization: Bearer <your-token>
Content-Type: application/json
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "gpt-image-1.5",
"operation": "createImage",
"prompt": "A photorealistic portrait of a woman with flowers in her hair, golden hour lighting",
"size": "1024x1024",
"quantity": 1,
"quality": "high"
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
Parameters
| Field | Default | Allowed | Notes |
|---|---|---|---|
prompt | — ✅ | ≤ 32 000 chars | Natural-language works best. |
size | 1024x1024 | 1024x1024 / 1536x1024 / 1024x1536 | Exact pixel dimensions as a string, not width/height. |
quantity | 1 | 1–4 | Lower cap than gpt-image-1 (which allows up to 10). |
background | auto | auto / transparent / opaque | Transparent backgrounds require PNG-compatible output. |
quality | high | low / medium / high | 1.5 doesn't expose auto; pick one explicitly. |
Editing (editImage)
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "gpt-image-1.5",
"operation": "editImage",
"prompt": "Make it a winter scene with snow falling",
"size": "1024x1024",
"images": [
"https://image.civitai.com/.../source.jpeg"
]
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
Pass one or more source images in images[]. The Edit shape also accepts a mask (string URL/DataURL/Base64) if you want to inpaint a specific region.
gpt-image-1 (previous flagship, transparent backgrounds)
Same shape as gpt-image-1.5 but older weights. Reach for it when you need:
quantity > 4(up to 10)background: "transparent"(1.5 also supports it but older clients may already be wired for 1)quality: "auto"
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "gpt-image-1",
"operation": "createImage",
"prompt": "A stylized logo mark for a coffee brand, simple vector illustration",
"size": "1024x1024",
"quantity": 1,
"background": "transparent",
"quality": "high"
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
size, background, images[], and mask work exactly like 1.5; the only difference is quantity goes up to 10 and quality accepts auto as a fifth option.
dall-e-3 (style-controlled, natural vs vivid)
dall-e-3 is create-only and exposes a style dimension the GPT-Image models don't:
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "dall-e-3",
"operation": "createImage",
"prompt": "A majestic fantasy landscape with floating islands",
"size": "1024x1024",
"style": "vivid",
"quality": "hd"
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
| Field | Default | Allowed | Notes |
|---|---|---|---|
prompt | — ✅ | ≤ 4 000 chars | DALL·E 3 silently rewrites short/vague prompts — write verbose directives for control. |
size | — ✅ | 1024x1024 / 1792x1024 / 1024x1792 | Required field on DALL·E 3 (unlike GPT-Image, which has a default). |
style | vivid | natural / vivid | vivid pushes hyper-real colours; natural stays closer to the prompt. |
quality | auto | auto / standard / hd | All quality tiers cost the same Buzz (300 flat per image); hd increases runtime only. |
DALL·E 3 doesn't support editImage or multiple samples — quantity isn't on the schema.
dall-e-2 (legacy)
Only include for legacy compatibility. Prompt ≤ 1000 chars, square outputs only, and much lower output quality than newer models:
{
"steps": [{
"$type": "imageGen",
"input": {
"engine": "openai",
"model": "dall-e-2",
"operation": "createImage",
"prompt": "A vintage postcard illustration of a mountain town",
"size": "512x512",
"quantity": 1
}
}]
}/v2/consumer/workflowsRequest body — edit to customize (e.g. swap the image URL or prompt)
| Field | Allowed | Notes |
|---|---|---|
size | 256x256 / 512x512 / 1024x1024 | Square only. |
prompt | ≤ 1000 chars | Much tighter than newer models. |
quantity | 1–10 |
Supports editImage with image (a single source string, not an array), but honestly — use GPT-Image 1 or 1.5 unless you have a reason not to.
Reading the result
All models emit the standard imageGen output:
{
"status": "succeeded",
"steps": [{
"name": "0",
"$type": "imageGen",
"status": "succeeded",
"output": {
"images": [
{ "id": "blob_...", "url": "https://.../signed.png" }
]
}
}]
}Blob URLs are signed and expire — refetch the workflow or call GetBlob for a fresh URL.
Runtime
OpenAI's API queue is the dominant factor — Civitai routes your request straight through. Typical wall times:
| Model | Per-image wall time | wait recommendation |
|---|---|---|
dall-e-2 | 3–8 s | wait=30 fine |
dall-e-3 (standard) | 10–20 s | wait=60 fine |
dall-e-3 (hd) | 15–40 s | wait=60 usually fine |
gpt-image-1 / 1.5 | 10–30 s per image | wait=60 fine for quantity: 1; fall back to wait=0 for batches |
gpt-image-2 | 15–45 s per image (larger dims take longer) | wait=60 fine for quantity: 1; wait=0 + poll for batches or 4K |
Cost
Billed in Buzz on the workflow's transactions. Use whatif=true for an exact preview; see Payments (Buzz) for currency selection.
gpt-image-2 (size-aware)
Unlike the earlier GPT-Image models, gpt-image-2 prices scale with output dimensions as well as quality. The orchestrator finds the cheapest canonical tier whose dimensions cover your requested width/height (comparing rotation-aware), then bills at that tier:
| Dimensions | low | medium | high |
|---|---|---|---|
| 1024 × 768 | 13 | 52 | 195 |
| 1024 × 1024 | 13 | 78 | 286 |
| 1024 × 1536 | 13 | 65 | 221 |
| 1920 × 1080 | 13 | 52 | 208 |
| 2560 × 1440 | 13 | 78 | 299 |
| 3840 × 2160 | 26 | 143 | 533 |
All values are Buzz per image. Final cost is tier × quantity, plus any priority / output-format surcharges applied by the imageGen step. Requests above 3840 × 2160 clamp to that row; requests smaller than 1024 × 768 floor to the cheapest row.
In edit mode with width/height omitted (image_size: "auto"), we estimate cost from the first input image's dimensions.
Other models (flat per-quality)
total = base × quantity| Model | quality | Base (Buzz per image) |
|---|---|---|
gpt-image-1.5 | low | 25 |
gpt-image-1.5 | medium | 100 |
gpt-image-1.5 | high (default) | 375 |
gpt-image-1 | low | 25 |
gpt-image-1 | medium | 100 |
gpt-image-1 | high | 375 |
gpt-image-1 | auto | 300 |
dall-e-3 / dall-e-2 | (any) | 300 |
Examples:
gpt-image-2high,1024×1024,quantity: 1→ ~286 Buzzgpt-image-2high,1920×1080,quantity: 2→ ~416 Buzzgpt-image-2medium,3840×2160,quantity: 1→ ~143 Buzzgpt-image-1.5high,quantity: 1→ ~375 Buzzdall-e-3hd,quantity: 1→ ~300 Buzz
For the older models size, background, and style don't change the Buzz price — only quality (GPT-Image) and quantity do. For gpt-image-2, width / height also affect the price via the tier lookup above.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
400 with "size must be one of" | Sent 1536x1024 to DALL·E 3 (it wants 1792x1024) or 1792x1024 to GPT-Image | Match the size enum for the model you're using — the tables above list each model's allowed set. |
400 with "style is not a valid property" | Sent style outside DALL·E 3 | Only DALL·E 3 exposes style. |
400 with "quantity must be ≤ 4" on gpt-image-1.5 | Using 1.5 ceilings with 1's quantity expectations | Drop to quantity: 4 or use gpt-image-1 (up to 10). |
400 with "prompt too long" on dall-e-2 | DALL·E 2's 1000-char prompt cap | Trim the prompt or move to a newer model. |
| Output is unexpectedly stylised on DALL·E 3 | style: "vivid" default | Set style: "natural" for closer-to-prompt output. |
| Output is PNG when you wanted JPEG | Transparent backgrounds force PNG | Set background: "opaque" if you want JPEG, or leave outputFormat unset and accept whatever OpenAI returns. |
Request timed out (wait expired) | Large quantity or hd on DALL·E 3 during busy periods | Resubmit with wait=0 and poll. |
Step failed, reason = "blocked" | OpenAI's content filter or Civitai moderation | Don't retry the same input — see Errors & retries → Step-level failures. |
Related
SubmitWorkflow— operation used by every example hereGetWorkflow— for polling- Google image generation — alternative commercial tier with Imagen 4 and Nano Banana
- Gemini image generation — Google's Gemini 2.5 Flash Image direct-API route
- Flux 2 / Flux 1 / Qwen — open-weights alternatives on Civitai-hosted workers
- Image upscaling — chain after
imageGenfor higher-res output - Prompt enhancement — LLM-rewrite a prompt before feeding it in via
$ref - Full parameter catalog: the
OpenAIGpt1CreateImageInput,OpenAIGpt1EditImageInput,OpenAIGpt15CreateImageInput,OpenAIGpt15EditImageInput,OpenAIDallE3CreateImageGenInput,OpenAIDallE2CreateImageGenInput,OpenAIDallE2EditImageInputschemas in the API reference imageGenendpoint OpenAPI spec — standalone OpenAPI 3.1 YAML covering the fullimageGensurface