Conclusion
- Compressing a ~350-word prompt to 94 words (about 1/4) caused no degradation in quality, composition, or mood
- The optimized version actually produced more stable pose and background reflection — core elements fit within CLIP’s first chunk (75 tokens)
- The biggest waste is repeating the same concept — Japanese gravure style appeared 6 times, lighting 4 times, skin texture 4 times
- Natural language sentences at the end are completely wasted — due to CLIP’s chunk splitting, prose in later chunks is barely reflected
- Details implied by higher-level concepts can be removed —
curvy feminine silhouettemakes bust description unnecessary,rustic indoor cornermakes floorboard description unnecessary
Longer prompts ≠ higher quality. In fact, important elements risk being pushed past the 75-token boundary into less effective chunks.
The Prompt Under Test
We tested a gravure photography prompt (~350 words) to see how much redundancy could be removed.
At first glance it looks rich and detailed, but much of it is just the same ideas rephrased over and over.
Issue 1: Massive Concept Duplication
The most serious problem is the same concept being repeated multiple times.
Japanese Gravure Style: 6 Times
Japanese celebrity gravure aestheticJapanese celebrity photobook styleJapanese gravure-inspired portraitJapanese celebrity gravureidol photobook realismJapanese photobook sensibility
Once is enough. Consolidated to a single Japanese celebrity photobook style.
Lighting: 4 Times
gentle diffused light creates luminous fair highlights and delicate tonal transitionssoft diffused indoor light with a Japanese photobook feelsoftened warm indoor light with a cleaner and more delicate finishbrightened skin tones, gentle shadow separation, elegant natural glow
All saying “soft indoor light.” A single soft diffused indoor light suffices.
Skin Texture: 4 Times
porcelain-fair with a soft warm-neutral undertonesoft milky skin texture with natural smoothness and realistic detailbrightened skin tonesfair skin glowing softly
Consolidated to porcelain-fair skin with warm-neutral undertone. The natural skin texture family of expressions has been verified as ineffective.
Other Duplications
| Concept | Repetitions | Consolidated to |
|---|---|---|
| Soft smile | 3 | closed-lip soft smile |
| Camera direction | 3 | direct eye contact |
| Pose preservation | 3 | Removed (specific pose description is sufficient) |
| Elegant/intimate mood | 3 | Removed (implied by style specification) |
| Depth of field | 3 | shallow depth of field, face in crisp focus, nearest foot blurred |
Issue 2: Ineffective and Redundant Expressions
Expressions verified as ineffective in previous tests were present.
| Expression | Reason | Source |
|---|---|---|
soft milky skin texture with natural smoothness and realistic detail | natural skin texture family has no effect | God Prompt Ablation |
realistic magazine-quality digital photo | z-image-turbo is photorealistic by default | Prompt Optimization 10 Themes |
clean image with refined skin rendering | Quality keyword, unverified effect | Same |
adult woman, late-20s to early-30s appearance | Already implied by 32yo | — |
| 7 makeup detail items | Implied by Japanese celebrity makeup | Profession Prompt Test |
Issue 3: Natural Language Sentences at the End
The prompt ends with ~50 words of prose:
She settled into the old chair and held the same relaxed pose, but the light now flatters her like a Japanese photobook cover—fair skin glowing softly, expression composed, the room turning gentle around her, soft star aura, elegant closeness, photobook charm
Our CLIP 75-token chunk test confirmed that elements in later chunks are unstable and only partially reflected. With a 350-word prompt split across 4-5 chunks, this final prose is essentially ignored.
Optimized Prompt
Here’s the result after fixing all the above issues.
~350 words → ~120 words (66% reduction). All essential elements preserved; duplicates and ineffective expressions removed.
Comparison Results
We generated images with the same seeds (42, 123, 456) using both the original and optimized prompts.
Seed 42
| Original (~350 words) | Optimized (~120 words) |
|---|---|
![]() | ![]() |
bands at thighs) more clearly.Seed 123
| Original (~350 words) | Optimized (~120 words) |
|---|---|
![]() | ![]() |
one arm bent behind the head).Seed 456
| Original (~350 words) | Optimized (~120 words) |
|---|---|
![]() | ![]() |
Comparison Summary
| Aspect | Original | Optimized |
|---|---|---|
| Pose intent reflection | Stable in 2/3 images | Stable in 3/3 images |
| Background element reflection | Lace/flooring inconsistent | Consistently present |
| Black lingerie reflection | Unclear in 1/3 images | Clear in 3/3 images |
| Lighting | Soft indoor light | Equivalent |
| Skin texture | Natural | Equivalent |
Lab Director’s Take: The shorter version actually nails the pose more consistently. Makes total sense with CLIP’s chunk splitting — but seeing it side by side really drives it home. All those 350 words and the back half was just… noise.
Follow-Up: Compressing 120 Words Down to 94
The 120-word optimized prompt still had room to cut. Based on verified findings, we trimmed six more areas.
| Removed Expression | Reason |
|---|---|
deep smooth (hair modifiers) | dark brown is sufficient; texture modifiers unverified |
loosely tucked back on one side | Implied by side part |
full natural bust contour | Implied by curvy feminine silhouette |
wooden (armchair in pose line) | Already described as vintage wooden armchair in background |
in crisp focus, nearest foot blurred → in focus | Implied by shallow depth of field + composition |
vertical three-quarter body shot | Overlaps with 3:4 vertical + foreground-heavy foreshortened composition |
warm brown wood tones, weathered wooden floorboards | Implied by rustic indoor corner (ablation test) |
~120 words → ~94 words (further 22% reduction, 73% from the original 350).
120-Word vs 94-Word Comparison
Same seeds (42, 123, 456) compared.
Seed 42
| 120-word version | 94-word version |
|---|---|
![]() | ![]() |
Seed 123
| 120-word version | 94-word version |
|---|---|
![]() | ![]() |
Seed 456
| 120-word version | 94-word version |
|---|---|
![]() | ![]() |
Follow-Up Comparison Summary
| Aspect | 120-word | 94-word |
|---|---|---|
| Pose (arm behind head) | 3/3 stable | 3/3 stable |
| Armchair + cushion | 3/3 present | 3/3 present |
| Lace background | 3/3 present | 3/3 present |
| Black lingerie | 3/3 present | 3/3 present |
| Floorboards | 3/3 clear | Slightly less prominent in 2/3 |
| Depth of field | Shallow | Equivalent |
| Hair | Dark brown shoulder-length | Equivalent |
| Body type | Natural | Equivalent (full natural bust contour removal had no effect) |
The only minor difference at 94 words is slightly less prominent floorboard rendering. Since rustic indoor corner still implies the wood texture, the overall atmosphere is maintained. If you absolutely need explicit floorboards, keep wooden floorboards (saves 2 words instead of 5).
Lab Director’s Take: Thought we’d hit the floor at 120 words, but nope — another 20% gone.
curvy feminine silhouettecovering the bust description is a great example of how higher-level concepts do the heavy lifting.
Why Shorter Is More Stable
Reviewing the CLIP 75-token chunk splitting mechanism makes the reason clear.
Original prompt (~350 words):
- Split into 4-5 chunks
- Pose description spans chunks 1 and 2
- Background/clothing pushed to chunk 3+
- Final prose lands in chunk 5, essentially ignored
Optimized prompt (~120 words):
- Fits in 1-2 chunks
- Subject, pose, and expression all fit in chunk 1
- Background, composition, and style in chunk 2
- No wasted tokens, so attention is distributed across all elements
Further compressed (~94 words):
- Fits almost entirely in 1 chunk (minimal spillover to chunk 2)
- Details implied by higher-level concepts are removed; CLIP fills them from context
- Fewer tokens means more even attention distribution per token
Deletion Checklist
A checklist for compressing your own prompts.
Safe to Delete Immediately
- Second+ mentions of the same concept — style, lighting, skin texture, mood adjectives
realistic,photorealistic— z-image-turbo’s defaultnatural skin texture,coherent anatomy— verified ineffective- Natural language summary at the end — barely reflected in CLIP’s later chunks
- Age paraphrases like
adult woman—32yois sufficient
Implied by Higher-Level Concepts
Japanese celebrity makeup→ individual makeup details (eyeliner, aegyo-sal, brows, blush, lips) are impliedsummer festival→ lanterns and food stalls appear naturally (confirmed in ablation test)rustic indoor corner→warm brown wood tones,weathered wooden floorboards,dark wooden structural elementsare impliedcurvy feminine silhouette→full natural bust contouris impliedshallow depth of field+ composition →face in crisp focus,nearest foot blurredare implied
Synonymous Composition Descriptions
3:4 verticalalready covers the “vertical” invertical three-quarter body shotforeground-heavy foreshortened composition+ pose description is sufficient for composition
Keep These
- Specific poses —
one arm bent behind the head,torso slightly twisted - Lighting (once only) —
soft diffused indoor light - Composition —
foreground-heavy foreshortened composition,3:4 vertical - Core subject attributes —
32yo japanese actress, hairstyle, body type - Style (once only) —
Japanese celebrity photobook style
Lab Director’s Take: A prompt is an instruction manual, not poetry. Rephrasing the same thing in beautiful variations won’t impress CLIP.











![[Verified] Image Generation Prompt Best Practices](/tips/prompt-best-practices/cover_0_0000_4517457392071889496.webp)