Stable Diffusion prompts behave differently from the conversational style you might use with a chatbot. The model reads a comma-separated bag of tokens, weighs each one, and paints from that mix. Learn the syntax and you stop fighting the model. Get it wrong and you get muddy color, melted hands, and a composition that ignored half of what you asked for.

This guide covers the grammar that actually moves the needle: token order, attention weighting, the role of CFG scale and step count, and which sampler to reach for. Every example is copy-pasteable. Paste it into Automatic1111, ComfyUI, Forge, or whatever front end you run, change the subject, and go.

How Stable Diffusion prompts are read #

The prompt is tokenized, then encoded by a text encoder (CLIP in SD 1.5 and SDXL; SDXL adds a second, larger encoder). The model does not parse sentences for meaning the way you do. It keys off recognizable tokens and the weight attached to each. Three practical consequences follow.

First, order carries weight. Tokens near the front get more influence. Put the subject and the two or three things you care about most up top. Second, commas separate concepts and give the encoder clean boundaries. Third, there is a 75-token window per chunk. Front ends stitch chunks together, but a 200-word prompt dilutes everything. Tighter prompts usually beat sprawling ones.

A reliable skeleton: subject, descriptors, style, medium, lighting, composition, quality tags. You do not need all six slots every time, but the order keeps you honest.

A baseline prompt you can paste

Positive:
portrait of a weathered fisherman, deep wrinkles, salt-and-pepper beard,
wearing a yellow raincoat, standing on a harbor at dawn,
soft overcast light, shallow depth of field, 50mm lens,
photorealistic, highly detailed, sharp focus

Negative:
lowres, blurry, deformed, extra fingers, bad anatomy, watermark, text

Steps: 28
Sampler: DPM++ 2M Karras
CFG scale: 6.5
Size: 832x1216

Run that, then change one variable at a time. Swap the lens, move the lighting, drop a descriptor. Single-variable changes teach you what each token does. Changing five things at once teaches you nothing.

Attention weights: the (token:1.3) syntax #

This is the part people skip, and it is the single biggest lever in Stable Diffusion prompts. You can tell the model how hard to push any token. The syntax is parentheses with a colon and a number.

  • (red coat:1.3) — multiply attention by 1.3. The coat gets pushed harder.
  • (red coat:0.7) — multiply by 0.7. The coat gets dialed back, useful when a token is dominating.
  • (red coat) — shorthand for roughly 1.1.
  • ((red coat)) — nested parens stack, roughly 1.21. Prefer the explicit number; it is easier to reason about.

Stay in a sane range. Weights from 0.5 to 1.5 cover almost everything. Push past 1.6 and tokens start to deep-fry the image: oversaturated color, fried edges, the model obsessing over one element until the rest collapses. If you need more than 1.5, the token is probably in the wrong place or competing with something else.

Positive:
(cyberpunk street market:1.2), neon signs, rain-slick pavement,
(volumetric fog:1.1), crowd of vendors, steam rising,
cinematic, (dramatic rim lighting:1.3), wide shot,
highly detailed, 8k

Negative:
(blurry:1.2), lowres, jpeg artifacts, oversaturated,
deformed, extra limbs, watermark, signature

Steps: 30
Sampler: DPM++ 2M Karras
CFG scale: 7
Size: 1216x832

Blending and scheduling

Two more operators are worth knowing. Prompt editing swaps a token partway through the render: [oak tree:pine tree:0.4] starts on the oak and switches to pine at 40% of steps. Useful for morphs and for nudging a shape without committing to it. The AND keyword (Automatic1111) composites two prompts: a castle AND a thunderstorm tells the model to honor both rather than averaging them into mush. Use these sparingly. They are precision tools, not everyday grammar.

CFG scale, steps, and samplers #

Prompt text is half the job. The sampler settings decide how faithfully and how cleanly that prompt becomes pixels.

CFG scale (classifier-free guidance) controls how literally the model obeys your prompt. Low CFG lets the model improvise; high CFG forces compliance at the cost of contrast and naturalness.

  • 3 to 5 — loose, painterly, the model fills gaps with its own ideas. Good for abstract or dreamy work.
  • 6 to 8 — the everyday range. Obeys the prompt while keeping color and contrast believable. Start at 7.
  • 9 and up — rigid adherence, often with blown-out contrast and a fried look. Reach for it only when the model keeps ignoring a key instruction, and expect to fix saturation afterward.

Steps set how many denoising passes run. More steps is not automatically better. Most modern samplers resolve a clean image in 20 to 35 steps. Beyond that you burn compute for changes you cannot see. Below 15 the image stays noisy and undercooked.

Samplers are the integration method. A few you will actually use:

  • DPM++ 2M Karras — the safe default. Sharp, consistent, converges fast. Great at 25 to 30 steps.
  • Euler a — the “a” means ancestral, so it keeps injecting noise and never fully settles. Output keeps shifting with step count, which makes it lively and a little unpredictable. Nice for illustration and concept work.
  • DPM++ SDE Karras — detailed and textured, slower. Good for final renders where you want extra grit.
  • DDIM — fast, deterministic, a bit softer. Handy for quick iteration.

A practical loop: draft on Euler a at 20 steps to explore compositions fast, then lock your favorite seed and render the final on DPM++ 2M Karras at 28 to 30 with CFG 6 to 7.

Quality tags, and why fewer is often more #

Tags like masterpiece, best quality, highly detailed, 8k, sharp focus nudge the model toward training images that carried those labels. They help, in moderation. Stacking fifteen quality tags does not make the image fifteen times better; it crowds out your actual subject and flattens the result. Pick three or four that match the look you want and move on. The descriptive tokens, the lens, the lighting, those do the real work.

The fastest way to improve a Stable Diffusion prompt is to delete words. Cut the prompt in half, see what breaks, and you will discover which tokens were actually carrying the image.

Putting it together #

Stable Diffusion prompts reward structure and restraint. Lead with the subject, weight the two or three tokens that matter with values between 0.7 and 1.5, keep CFG around 7, render 25 to 30 steps on DPM++ 2M Karras, and pair every positive prompt with a focused negative. Change one variable at a time and keep notes on your seeds. Do that and your hit rate climbs from lucky-now-and-then to repeatable. When you want a faster starting point, the prompt generator at ArtPrompts Generator gives you a clean, weighted skeleton you can paste and tune in seconds.