AI Image Generation Resolution & Aspect Ratio Guide

In AI image generation, resolution and aspect ratio settings are critical parameters that directly affect image quality. If not set correctly, they can cause human anatomy distortion or composition breakdown. This article systematically covers everything from recommended sizes per model to optimal aspect ratios by use case and upscaling techniques.

Why Resolution and Aspect Ratio Matter

The Relationship Between a Model’s Training Resolution and Generation Quality

AI image generation models are trained on images at a specific resolution. For example, Stable Diffusion 1.5 was trained on 512×512 pixel images, and SDXL on 1024×1024 pixel images. Generating at sizes significantly different from the training resolution can degrade quality.

Specifically, specifying a size much larger than the training resolution can cause the same subject to appear multiple times within the frame, or human body proportions to break down.

The Influence of Aspect Ratio on Composition

The aspect ratio (width-to-height ratio) greatly influences the composition of the generated image. A square (1:1) tends to produce bust-up compositions, though full-body or long shots are also possible depending on the prompt. A landscape (16:9) format tends to produce wider compositions including more background. Choosing an aspect ratio suited to your purpose is the shortcut to getting the composition you intend.

Recommended Resolutions by Model

Each model has a base resolution from training time, with recommended sizes based on that.

SD 1.5

Aspect ratio	Resolution	Use case
1:1	512×512	Base size
2:3	512×768	For portraits
3:2	768×512	For landscapes and horizontal compositions

SD 1.5’s training resolution is 512×512. Quality degradation becomes prominent above 768 pixels, so using an upscaler is recommended when larger images are needed.

SDXL

Aspect ratio	Resolution	Use case
1:1	1024×1024	Base size
2:3	832×1216	For portraits
3:2	1216×832	For landscapes and horizontal compositions
9:16	768×1344	For smartphone wallpapers
16:9	1344×768	For wide compositions

SDXL is trained with 1024×1024 as the baseline, and resolution combinations that keep the total pixel count around 1 megapixel (approximately 1 million pixels) are stable.

※ Each resolution is an approximation to keep the total pixel count around 1MP and may differ slightly from the exact aspect ratio

SD3 / SD3.5

Aspect ratio	Resolution	Use case
1:1	1024×1024	Base size

The SD3 series is designed around 1024×1024. When changing aspect ratios, you can use SDXL resolutions as a guide.

Flux

Aspect ratio	Resolution	Use case
1:1	1024×1024	Base size
Any	~1MP total pixels	Free ratio

Flux is a model with high aspect ratio flexibility. Keeping total pixel count around 1 million pixels yields stable quality across a wide range of aspect ratios.

Recommended Aspect Ratios by Use Case

The optimal aspect ratio varies depending on where you plan to use the generated image.

Use case	Aspect ratio	SDXL recommended resolution	Notes
Social media posts (Instagram, etc.)	1:1	1024×1024	Ideal for feed posts
Social media posts (Instagram, etc.)	4:5	896×1120	Portrait posts occupy more screen space
Blog thumbnails	16:9	1344×768	Also suitable as OGP images
Portraits	2:3	832×1216	Full body to upper body fits naturally
PC wallpapers	16:9	1344×768	Assuming upscaling
Ultra-wide wallpapers	21:9	1536×660	Upscaling required
Smartphone wallpapers	9:16	768×1344	Assuming upscaling

For wallpaper use cases, the common approach is to output at the resolutions above and then enlarge to the final resolution using an upscaler described below.

What Happens When Generating at Sizes Different from Training Resolution?

Common Problems

Specifying sizes significantly different from the training resolution tends to cause the following issues:

Anatomy distortion: Human body proportions break down, or the same subject generates multiple times
Composition breakdown: Unintended zoom or subject duplication
Detail breakdown: Increased occurrence of abnormal finger counts or facial distortion

These problems are particularly pronounced when specifying 1024×1024 or larger directly in SD 1.5.

Solution: Using Hires.fix

Hires.fix (High Resolution Fix) is a feature that first generates an image at the training resolution, then upscales it and runs denoising again. This allows obtaining high-resolution images while suppressing composition breakdown.

Confirm composition at training resolution (e.g., 512×512)
Upscale by specified multiplier (e.g., 2x)
Generate again with set denoise strength

A denoise strength of 0.4–0.6 is typical. Too low leaves blurriness; too high changes the composition.

Upscaling Techniques

Here are the main methods for making generated images even higher resolution.

Hires.fix

As described above, this is the built-in upscaling feature used during generation. It comes standard in WebUIs like AUTOMATIC1111 and Forge. No additional installation is required and it’s easy to use, but VRAM consumption increases.

Ultimate SD Upscale

An extension that combines img2img with tile splitting. By splitting the image into tiles (small areas) and processing them in order, large images can be generated while keeping VRAM usage down. If tile boundary seams are noticeable, adjust the overlap settings.

Tiled Diffusion (MultiDiffusion)

A method that splits the image into tiles and denoises each tile in parallel. Similar in purpose to Ultimate SD Upscale, but differs in that the diffusion process itself is done at the tile level. VRAM consumption can be further reduced by combining with Tiled VAE.

External Upscalers

A method that uses AI-based super-resolution models to enlarge images after generation.

Tool	Features
Real-ESRGAN	Versatile, supports both photorealistic and illustration
4x-UltraSharp	Strong detail enhancement
SwinIR	Swin Transformer-based upscaler
Topaz Gigapixel AI	Commercial software, easy-to-use GUI

External upscalers are independent of the generation process and have the advantage of being applicable to any model’s generated images.

Summary

Resolution and aspect ratio settings are fundamental elements that influence AI image generation quality.

Match the model’s training resolution — the basic principle for stable quality
Choosing an aspect ratio suited to your use case makes it easier to get the intended composition
Use upscalers when high resolution is needed — output near the training resolution during generation

It’s recommended to start by generating at each model’s recommended resolution, then adjust the aspect ratio according to your use case.

PR RunPod クラウドGPUでAI画像生成 RunPodを始める →