Why z-image-turbo? The AI Image Generation Model Chosen for Speed and NSFW Support

Why z-image-turbo? The AI Image Generation Model Chosen for Speed and NSFW Support

Among the many AI image generation models out there, z-image-turbo stands out with two key strengths — “speed” and “NSFW support” — making it the ideal model for generating realistic human images in particular.

This article explains why to choose z-image-turbo, comparing it against other models.

Features of z-image-turbo

1. Overwhelmingly Fast

z-image-turbo’s biggest feature is generation speed.

The following is a rough guide for generating one 1024×1024 image on an RTX 4090 (24GB VRAM):

ModelGeneration time per image (approx.)Required steps
z-image-turbo~3–5 seconds8 steps
SDXL15–30 seconds20–30 steps
Flux.1 dev20–40 seconds20–30 steps
Stable Diffusion 1.55–15 seconds20–30 steps

Generation time varies greatly by GPU performance and image size. GPU VRAM 8GB class will take approximately 2–5× longer.

Being able to generate high-quality images in just 8 steps means you can iterate through prompt experiments rapidly.

For work that involves repeatedly “changing the prompt slightly and trying again,” 3 seconds per image vs 30 seconds per image is a completely different experience. Over 10 attempts, that’s 30 seconds vs 5 minutes.

2. Supports NSFW Image Generation

When you want to generate NSFW (Not Safe For Work) images, the first choice is cloud service vs local execution.

Cloud services like DALL-E 3 (OpenAI) and Midjourney prohibit NSFW content generation in their terms of service. Therefore, generating NSFW content requires running models yourself on a local PC or cloud GPU.

Additionally, even among locally-runnable models, NSFW support varies by model.

CategoryService/ModelNSFW Support
Cloud serviceDALL-E 3 (OpenAI)Completely prohibited
Cloud serviceMidjourneyProhibited
Local modelz-image-turboUnrestricted
Local modelStable Diffusion (official)Depends on model (license restrictions)
Local modelFlux.1 schnellWith safety filter

z-image-turbo has no safety filter, and the license (Apache 2.0) contains no explicit prohibition on NSFW, making it highly expressive. It can handle a wide range from realistic human images to artistic works.

3. Excels at Realistic Japanese Women

z-image-turbo excels at photorealistic human images, particularly Asian women’s portrayal. Natural skin texture and Japanese facial features are stably generated in output images, with strong depiction of hair and facial expressions.

Lineage of AI Image Generation Models

To understand z-image-turbo, let’s first organize the lineage of AI image generation models.

Stable Diffusion family (LDM family)

A family of models based on Latent Diffusion Models (LDM).

2022  Stable Diffusion 1.x  ← LDM paper (Rombach et al.)
  ↓   U-Net + CLIP + VAE
2023  Stable Diffusion 2.x  ← Changed to OpenCLIP
  ↓
2023  SDXL                  ← Larger U-Net + CLIP×2 dual encoder
  ↓
2024  Stable Diffusion 3    ← Migrated to MMDiT (Transformer-based)

Technical features:

Flux family

A next-generation model announced in 2024 by Black Forest Labs, founded by the original Stable Diffusion authors (Robin Rombach, Andreas Blattmann, Patrick Esser).

2024  FLUX.1 [pro]     ← API only, highest quality
      FLUX.1 [dev]     ← Non-commercial, guidance distillation
      FLUX.1 [schnell] ← Apache-2.0, timestep distillation (4 steps)

Key technical advances over SD family:

ElementStable Diffusion (1.x–SDXL)FLUX.1
DenoiserU-Net (CNN)MMDiT (Transformer)
Text encoderCLIP onlyCLIP + T5 (dual)
Diffusion methodDiffusion (DDPM)Flow Matching
Parameter count~2.6B (SDXL)12B
Text understanding75-token limit512 tokens supported

Flow Matching is an improved version of the traditional Diffusion process that learns more efficiently from noise to clean image. If Diffusion is a “random walk,” Flow Matching learns a “shortest path close to a straight line.”

The addition of the T5 text encoder enables understanding of long prompts beyond CLIP’s 75-token limit.

z-image-turbo’s Position

z-image-turbo is a 6B-parameter realism-focused model with the following characteristics:

  • Distilled model capable of high-quality generation in 8 steps
  • Operates at CFG=1.0 (guidance is built into the model)
  • No NSFW restrictions
  • English and Chinese support
  • Reference image guidance support (Z-Image Base)

Distillation is a technique that transfers knowledge from a large model into a compact one. z-image-turbo needs only 8 steps because the original model’s inference capability has been compressed through distillation. The same principle allows Flux.1 schnell to operate in 4 steps.

Model Selection Guide

"Want to quickly generate realistic images"
  → z-image-turbo (8 steps, NSFW supported)

"Pursuing the highest image quality"
  → FLUX.1 dev (50 steps, 12B parameters)

"Want open-source with free customization"
  → SDXL (rich LoRA/FineTune ecosystem)

"Want to run locally with low resource requirements"
  → SD 1.5 family (low VRAM support)

Comprehensive Comparison with Other Models

Basic Performance

Comparisonz-image-turboSDXLFlux.1 devSD 1.5
Generation speed◎ (8 steps)△ (20-30 steps)△ (50 steps)○ (20-30 steps)
Image quality
NSFW support○ (model-dependent)△ (license restriction)○ (model-dependent)
Realistic people
Required VRAMMediumHigh (6GB+)Very high (~50GB)Low (4GB+)
Parameter count6B2.6B12B0.9B

Architecture

Comparisonz-image-turboSDXLFlux.1 devSD 1.5
Text encoderCLIP×2CLIP + T5CLIP
DenoiserU-NetMMDiTU-Net
Diffusion methodDiffusionFlow MatchingDiffusion
ComfyUI support
LoRA ecosystemFew◎ (very abundant)Growing◎ (very abundant)

Negative Prompt & img2img Support

This is an often-overlooked but important point in model selection.

Featurez-image-turboSDXLFlux.1 devFlux.1 schnellSD 1.5
Negative prompts△ (see below)△ (see below)×
img2img× (not supported)
Inpainting×◎ (Fill)◎ (Fill)
ControlNet×○ (Canny, Depth)

Negative Prompt Support Status

Negative prompts are based on the Classifier-Free Diffusion Guidance (CFG) mechanism. For CFG to work, the model needs to be able to perform both conditional and unconditional predictions.

SD 1.5 / SDXL: Full support

Uses conventional CFG (guidance_scale = 7–12 or so). Negative prompts are used in place of unconditional prediction and clearly have effect. The reason negative prompts work most effectively in SD-family models is that this CFG mechanism operates straightforwardly.

Flux.1 dev: Limited

Flux.1 dev is a “guidance-distilled” model with CFG baked into the model itself through distillation (guidance_scale=3.5). Standard negative prompts basically don’t work. However, using the true_cfg_scale parameter in diffusers forces conventional CFG, enabling negative prompts (inference cost doubles).

Flux.1 schnell: Not supported

The model operates at guidance_scale=0 due to timestep distillation, so the CFG mechanism itself cannot be used. Negative prompts have no effect.

z-image-turbo: Doesn’t work at CFG=1.0

z-image-turbo is designed to operate at CFG=1.0. CFG=1.0 means “no guidance,” so negative prompts do not work. While it’s possible to set negative prompt fields in ComfyUI workflows, we have confirmed they have no effect on output.

img2img Support Status

img2img (generating new images based on existing images) is achieved by using input images with a small amount of noise added as the initial noise instead of random noise.

SD 1.5 / SDXL: Full support

The degree of change from the original image can be controlled with the denoise parameter (0.0–1.0). denoise=0.3 gives output close to the original; denoise=0.8 is nearly a new generation. Precise control combined with ControlNet (Canny, Depth, OpenPose, etc.) is also possible.

Flux.1: Supported

Provided as task-specific derivative models: Flux.1 Fill (Inpainting), Flux.1 Canny (structure control), Flux.1 Depth (depth control), Flux.1 Redux (image transformation), Flux.1 Kontext (image editing).

z-image-turbo: Not supported (txt2img only)

z-image-turbo only supports text-to-image generation (txt2img). img2img, Inpainting, and ControlNet are not available.

Comprehensive Model Selection Guide

Taking the above feature differences into account:

"Want to mass-produce realistic images quickly with txt2img"
  → z-image-turbo (best for speed + NSFW)

"Want to use negative prompts to refine quality"
  → SDXL (CFG works most effectively)

"Want to edit/process existing images (img2img, Inpainting)"
  → SDXL or Flux.1 Fill (z-image-turbo not supported)

"Want to control pose and composition with ControlNet"
  → SDXL (most mature ecosystem) or Flux.1 Canny/Depth

"Want to give detailed instructions with long prompts"
  → Flux.1 dev (T5 encoder, 512 tokens supported)

"Pursuing the highest image quality"
  → Flux.1 dev (12B parameters, but needs ~50GB VRAM)

"Want to run locally with low resource requirements"
  → SD 1.5 family (4GB VRAM+, abundant LoRAs)

Overall, z-image-turbo is a choice specialized for “fast txt2img generation,” “NSFW support,” and “realistic people.” For cases needing img2img or ControlNet, SDXL or Flux.1 will be used in combination.

License, Commercial Use, and Explicit Content

Checking the license is essential when using models. Models that simultaneously satisfy all three requirements of “commercial use OK,” “no NSFW restrictions,” and “negative prompt support” are actually very rare.

Models Meeting the Requirements

ModelLicenseCommercialNSFW restrictionNeg promptimg2imgNotes
Z-Image (full)Apache 2.0Not statedUnconfirmedBlog recommended. CFG 3.0–5.0, 28–50 steps
Z-Image TurboApache 2.0Not stated△ (CFG=1.0)×Fast version. 8 steps
Kolors (Kuaishou)Apache 2.0 + registration△ (registration required)AmbiguousCommercial use requires application. UNet + ChatGLM3

Z-Image (full) is the full version of the same model family as Z-Image Turbo used on this blog. While Turbo specializes in 8-step fast generation through distillation, the full version performs inference at CFG 3.0–5.0 with 28–50 steps, and negative prompts work completely.

Models Not Meeting Requirements

Research into major models found that the following models fail to meet one or more of the three requirements:

ModelReason for ineligibility
SDXLCreativeML OpenRAIL++-M. Commercial allowed but NSFW restriction interpretation is ambiguous
SD 1.5CreativeML OpenRAIL-M. Prohibits “non-consensual sexual content”
SD 3.5Revenue restrictions, negative prompts only partial
SDXL TurboNon-commercial license, negative prompts not supported
FLUX.1 devNon-commercial license (paid contract required), NSFW restriction + filter implementation mandatory
FLUX.1 schnellNegative prompts not supported (CFG=0 distilled model)
FLUX.2 klein 4BNegative prompts not supported
Qwen-ImageNegative prompts effectively not supported

License Comparison Table

ItemZ-ImageSD 1.5SDXLFlux.1 devFlux.1 schnell
LicenseApache 2.0OpenRAIL-MOpenRAIL++-MNon-CommercialApache 2.0
Commercial use×
NSFW restrictionNone× (filter required)
Neg prompt◎ (full) / △ (turbo)△ (possible with true_cfg)×
Minor contentStrictly prohibitedStrictly prohibitedStrictly prohibitedStrictly prohibitedStrictly prohibited

Common to all models: Generating sexual content involving minors is strictly prohibited by law regardless of license.

This Blog’s Choice

This blog recommends the Z-Image family:

  • For rapid mass generation → Z-Image Turbo (8 steps, negative prompts limited)
  • For pursuing quality → Z-Image full (28–50 steps, negative prompts fully supported)

Both use the Apache 2.0 license with no commercial use restrictions and no explicit prohibition on NSFW content.

Disclaimer: Interpretation of licenses does not constitute legal advice. When using commercially, verify the full text of each license and consult a legal professional if needed. Always comply with the laws of each country (obscenity laws, child pornography prohibition laws, etc.).

How to Use z-image-turbo

There are three main ways to use z-image-turbo.

A browser-only domestic service. No environment setup needed — you can start generating images with z-image-turbo immediately.

  • From ¥990/month
  • ComfyUI available
  • Japanese UI

For detailed setup instructions, see the ConoHa AI Canvas Getting Started Guide.

Method 2: ComfyUI Workflow

With ComfyUI, you can fine-tune all parameters of z-image-turbo. We distribute workflows with negative prompt settings already configured.

z-image-turbo ComfyUI Workflow Distribution

Method 3: RunPod Serverless (For Advanced Users)

For advanced users who want API-based bulk generation and automation, setting up on RunPod Serverless is recommended.

For details, see the Complete Guide to Running z-image-turbo on RunPod Serverless.

Prompt Tips

Writing prompts well is important for generating good images with z-image-turbo.

Basic rules:

  • Word order matters — elements written first are reflected most strongly
  • Emphasis syntax(element:1.3) can emphasize specific elements
  • Negative prompts — remove unwanted elements to improve quality

For details, see Prompt Basics.

Generation Cost

Because z-image-turbo is fast, the cost per image is low, which is another appeal.

With ConoHa AI Canvas

PlanMonthlyImagesPer image
Entry¥990500~¥2
Standard¥1,9801,500~¥1.3

With RunPod Serverless

GPU-time-based billing. Because z-image-turbo generates in 8 steps:

  • About ¥0.5–1.5 per image (varies by GPU and instance size)
  • Well-suited for bulk generation

For detailed cost comparison, see the Cloud GPU Comparison.

Summary

Reasons to choose z-image-turbo:

  1. Fast — 8 steps, ~3–5 seconds per generation. Rapid iteration through trial and error
  2. NSFW supported — No safety filter, high expressive freedom
  3. Realistic human portrayal — Particularly excels at Japanese women
  4. Low cost — Faster speed = shorter GPU time = lower cost

Next Steps