Conclusion
- Compressing from 200+ tokens to ~40 tokens showed no degradation in image quality or core element reproduction
- Camera model names, lens specs, aperture, ISO, and other shooting parameters have no effect on output
- Quality keywords like
ultra-detailed skin textureandsharp focusare also ineffective - Repeating the same concept three times is no better than writing it once — it just wastes tokens
Do you believe longer prompts produce better images? That writing camera model names and F-stops makes photos more realistic? We put that urban legend to the test.
The Prompt Under Test
The subject is a studio portrait prompt: a nude woman against a red gradient backdrop with a circular light.
This prompt has several issues.
Problem Analysis
1. Elements Verified as Ineffective
| Element | Category | Source |
|---|---|---|
Camera: Medium format, Hasselblad X2D / Sony A1 | Camera model | Confirmed ineffective in Bikini Prompt Iteration |
Lens: 85mm prime, Aperture: f/4 | Lens & aperture | Same as above |
Shutter Speed: 1/160, ISO: 100 | Shooting params | Same as above |
ultra-detailed skin texture | Quality keyword | natural skin texture confirmed ineffective in Prompt Optimization 10 Themes |
sharp focus | Quality keyword | z-image-turbo outputs sharp images by default |
2. Redundant Descriptions
The same concept appears in multiple places.
| Concept | Occurrences | Needed |
|---|---|---|
| Confident pose | standing confidently / confident stance / Calm dominant self-assured | Once is enough |
| Hand in pocket | one hand in his pocket / One hand in pocket | Once is enough |
| Dramatic lighting | dramatic lighting / Key Light, Fill Light, Rim Light details | One phrase |
3. Token Count Issues
CLIP processes 75 tokens per chunk. Influence drops off in subsequent chunks. At 200+ tokens spanning 3+ chunks, the camera specs and lighting details in the back half are likely being ignored entirely.
Optimized Version
Keeping only the core elements:
What Was Removed
- All camera/lens/shooting parameters — Camera, Lens, Aperture, Shutter Speed, ISO, White Balance, Focus
- Quality keywords —
sharp focus,ultra-detailed skin texture,premium magazine aesthetic,clean composition - Redundant expressions —
confident stance,Calm dominant self-assured,Slight lean, etc. - Detailed lighting breakdown — Key/Fill/Rim Light specs → consolidated into
dramatic front-left softbox lighting
What Was Kept
- Style —
A sophisticated studio portrait(sets overall direction at the front) - Subject —
1girl, 32yo japanese actress, full nude - Pose —
standing with one hand on hip(mentioned once) - Background —
bold red gradient backdrop, large glowing circular light behind her - Lighting —
dramatic front-left softbox lighting, high contrast - Finish —
luxury editorial style, cinematic color grading, modern minimalism - Composition —
3/4 body portrait, subject slightly off-center
Method
| Parameter | Value |
|---|---|
| Model | z-image-turbo |
| Steps | 8 |
| Sampler | euler |
| Scheduler | ddim_uniform |
| CFG | 1.0 |
| Image size | 1024×1024 |
| Seeds | 42, 123, 456 (3 images per condition) |
Note: Because the token sequences differ between prompts, images differ even with the same seed. This is the same phenomenon documented in Prompt Fundamentals regarding weight syntax — a side effect of token sequence changes, not a quality difference. The comparison target is whether core elements (red backdrop, circular light, nude, studio portrait) are equally reproduced.
Comparison Results
Seed 42
| Original (200+ tokens) | Optimized (~40 tokens) |
|---|---|
![]() | ![]() |
Both reproduce the red gradient backdrop, circular light, and studio portrait style. No visible difference in skin texture or lighting quality.
Seed 123
| Original (200+ tokens) | Optimized (~40 tokens) |
|---|---|
![]() | ![]() |
The original is more frontal while the optimized version has an angled composition — this is due to token sequence randomization, not quality difference. Red backdrop and circular light are reproduced in both.
Seed 456
| Original (200+ tokens) | Optimized (~40 tokens) |
|---|---|
![]() | ![]() |
Stable composition in both. Skin texture is equivalent despite removing ultra-detailed skin texture from the optimized version.
Analysis
Core Element Reproduction
All core elements were reproduced across all 6 images (3 seeds × 2 conditions):
- Red gradient backdrop: 6 of 6
- Circular light: 6 of 6
- Studio portrait style: 6 of 6
- Nude: 6 of 6
No evidence was found that the additional elements in the 200+ token prompt (camera model, F-stop, ISO, Key Light angle, etc.) were reflected in the output.
Why Don’t Camera Specs Work?
This is speculative, but shooting parameters like Aperture: f/4 and ISO: 100 may appear in CLIP’s training data as camera metadata without being strongly associated with visual features of images. As a result, these specifications consume tokens without contributing to output.
Token Efficiency
| Original | Optimized | |
|---|---|---|
| Estimated tokens | 200+ (3+ chunks) | ~40 (within 1 chunk) |
| Core element reproduction | 6/6 | 6/6 |
| Quality difference | - | None observed |
The fact that the original — far exceeding the 75-token boundary — and the optimized version within a single chunk produce equivalent results demonstrates that what you write matters more than how much you write.
Lab Director: You thought writing “Hasselblad X2D” would make it look like it was shot on a Hasselblad? Nope. Those 5 tokens are way better spent on poses and lighting direction.
Summary
The following elements can be safely removed from prompts:
| Element | Tokens Saved |
|---|---|
| Camera model names | 5-6 |
| Lens focal length & aperture | 3-4 |
| Shutter speed, ISO, white balance | 5-6 |
sharp focus, ultra-detailed skin texture | 5 |
| Redundant expressions of the same concept | Variable (10-20) |
| Individual Key/Fill/Rim Light details | 15-20 |
Reallocate those freed tokens to scene descriptions, poses, and lighting direction — elements that actually affect the image. That’s what prompt optimization is really about.
Lab Director: There’s this vibe that longer prompts = more dedication, but when you actually test it, the entire back half of camera specs gets ignored. Short prompts that hit hard — that’s the way.







![[Verified] Image Generation Prompt Best Practices](/tips/prompt-best-practices/cover_0_0000_4517457392071889496.webp)

