In AI image generation, resolution and aspect ratio settings are critical parameters that directly affect image quality. If not set correctly, they can cause human anatomy distortion or composition breakdown. This article systematically covers everything from recommended sizes per model to optimal aspect ratios by use case and upscaling techniques.
Why Resolution and Aspect Ratio Matter
The Relationship Between a Model’s Training Resolution and Generation Quality
AI image generation models are trained on images at a specific resolution. For example, Stable Diffusion 1.5 was trained on 512×512 pixel images, and SDXL on 1024×1024 pixel images. Generating at sizes significantly different from the training resolution can degrade quality.
Specifically, specifying a size much larger than the training resolution can cause the same subject to appear multiple times within the frame, or human body proportions to break down.
The Influence of Aspect Ratio on Composition
The aspect ratio (width-to-height ratio) greatly influences the composition of the generated image. A square (1:1) tends to produce bust-up compositions, though full-body or long shots are also possible depending on the prompt. A landscape (16:9) format tends to produce wider compositions including more background. Choosing an aspect ratio suited to your purpose is the shortcut to getting the composition you intend.
Recommended Resolutions by Model
Each model has a base resolution from training time, with recommended sizes based on that.
SD 1.5
| Aspect ratio | Resolution | Use case |
|---|---|---|
| 1:1 | 512×512 | Base size |
| 2:3 | 512×768 | For portraits |
| 3:2 | 768×512 | For landscapes and horizontal compositions |
SD 1.5’s training resolution is 512×512. Quality degradation becomes prominent above 768 pixels, so using an upscaler is recommended when larger images are needed.
SDXL
| Aspect ratio | Resolution | Use case |
|---|---|---|
| 1:1 | 1024×1024 | Base size |
| 2:3 | 832×1216 | For portraits |
| 3:2 | 1216×832 | For landscapes and horizontal compositions |
| 9:16 | 768×1344 | For smartphone wallpapers |
| 16:9 | 1344×768 | For wide compositions |
SDXL is trained with 1024×1024 as the baseline, and resolution combinations that keep the total pixel count around 1 megapixel (approximately 1 million pixels) are stable.
※ Each resolution is an approximation to keep the total pixel count around 1MP and may differ slightly from the exact aspect ratio
SD3 / SD3.5
| Aspect ratio | Resolution | Use case |
|---|---|---|
| 1:1 | 1024×1024 | Base size |
The SD3 series is designed around 1024×1024. When changing aspect ratios, you can use SDXL resolutions as a guide.
Flux
| Aspect ratio | Resolution | Use case |
|---|---|---|
| 1:1 | 1024×1024 | Base size |
| Any | ~1MP total pixels | Free ratio |
Flux is a model with high aspect ratio flexibility. Keeping total pixel count around 1 million pixels yields stable quality across a wide range of aspect ratios.
Recommended Aspect Ratios by Use Case
The optimal aspect ratio varies depending on where you plan to use the generated image.
| Use case | Aspect ratio | SDXL recommended resolution | Notes |
|---|---|---|---|
| Social media posts (Instagram, etc.) | 1:1 | 1024×1024 | Ideal for feed posts |
| Social media posts (Instagram, etc.) | 4:5 | 896×1120 | Portrait posts occupy more screen space |
| Blog thumbnails | 16:9 | 1344×768 | Also suitable as OGP images |
| Portraits | 2:3 | 832×1216 | Full body to upper body fits naturally |
| PC wallpapers | 16:9 | 1344×768 | Assuming upscaling |
| Ultra-wide wallpapers | 21:9 | 1536×660 | Upscaling required |
| Smartphone wallpapers | 9:16 | 768×1344 | Assuming upscaling |
For wallpaper use cases, the common approach is to output at the resolutions above and then enlarge to the final resolution using an upscaler described below.
What Happens When Generating at Sizes Different from Training Resolution?
Common Problems
Specifying sizes significantly different from the training resolution tends to cause the following issues:
- Anatomy distortion: Human body proportions break down, or the same subject generates multiple times
- Composition breakdown: Unintended zoom or subject duplication
- Detail breakdown: Increased occurrence of abnormal finger counts or facial distortion
These problems are particularly pronounced when specifying 1024×1024 or larger directly in SD 1.5.
Solution: Using Hires.fix
Hires.fix (High Resolution Fix) is a feature that first generates an image at the training resolution, then upscales it and runs denoising again. This allows obtaining high-resolution images while suppressing composition breakdown.
- Confirm composition at training resolution (e.g., 512×512)
- Upscale by specified multiplier (e.g., 2x)
- Generate again with set denoise strength
A denoise strength of 0.4–0.6 is typical. Too low leaves blurriness; too high changes the composition.
Upscaling Techniques
Here are the main methods for making generated images even higher resolution.
Hires.fix
As described above, this is the built-in upscaling feature used during generation. It comes standard in WebUIs like AUTOMATIC1111 and Forge. No additional installation is required and it’s easy to use, but VRAM consumption increases.
Ultimate SD Upscale
An extension that combines img2img with tile splitting. By splitting the image into tiles (small areas) and processing them in order, large images can be generated while keeping VRAM usage down. If tile boundary seams are noticeable, adjust the overlap settings.
Tiled Diffusion (MultiDiffusion)
A method that splits the image into tiles and denoises each tile in parallel. Similar in purpose to Ultimate SD Upscale, but differs in that the diffusion process itself is done at the tile level. VRAM consumption can be further reduced by combining with Tiled VAE.
External Upscalers
A method that uses AI-based super-resolution models to enlarge images after generation.
| Tool | Features |
|---|---|
| Real-ESRGAN | Versatile, supports both photorealistic and illustration |
| 4x-UltraSharp | Strong detail enhancement |
| SwinIR | Swin Transformer-based upscaler |
| Topaz Gigapixel AI | Commercial software, easy-to-use GUI |
External upscalers are independent of the generation process and have the advantage of being applicable to any model’s generated images.
Summary
Resolution and aspect ratio settings are fundamental elements that influence AI image generation quality.
- Match the model’s training resolution — the basic principle for stable quality
- Choosing an aspect ratio suited to your use case makes it easier to get the intended composition
- Use upscalers when high resolution is needed — output near the training resolution during generation
It’s recommended to start by generating at each model’s recommended resolution, then adjust the aspect ratio according to your use case.


