Most people type a few words into Midjourney, get something muddy, and assume the model is the problem. It almost never is. Knowing how to write AI art prompts is the difference between a flat, generic render and an image that looks deliberate. The model is a very literal collaborator. It does exactly what you describe, and nothing you forget to mention.
This guide walks through the full process I use daily across Midjourney, Stable Diffusion, DALL·E 3, and Flux. No theory for its own sake. Just the order of operations that consistently produces sharper results, with prompts you can copy and run right now.
What a prompt actually does #
A text-to-image model converts your words into a point in a huge visual space, then denoises an image toward that point. Every token nudges the result. Vague words land you in a crowded, average region — that is where “trending on artstation, beautiful, masterpiece” lives, and why those words now read as noise. Specific words push you somewhere distinct.
So the core skill is reducing ambiguity. You are not writing a poem. You are giving directions to someone who cannot ask follow-up questions.
The six-part structure I start from #
Before touching parameters, get the words right. I think in six slots, roughly in priority order:
- Subject — who or what, and what they are doing.
- Composition — shot type, angle, framing.
- Environment — where it happens, time of day.
- Lighting — the single biggest lever for mood.
- Style / medium — photo, oil painting, 3D render, specific aesthetic.
- Technical — lens, film stock, resolution cues, then model parameters.
You do not need all six every time. But when an image feels off, the fix is almost always a slot you left empty. Here is a bare subject prompt versus a fully specified one in Midjourney:
a woman in a coat --ar 3:2
a woman in a long charcoal wool coat walking away from camera,
narrow rain-slicked alley in old Lisbon at dusk, wet cobblestones
reflecting warm shopfront light, low three-quarter rear angle,
35mm lens, shallow depth of field, soft overcast key light,
muted teal and amber palette, photojournalistic --ar 3:2 --style raw --v 6.1
Same model, same coat. The second one has a point of view. Notice the order: subject first, then how it is framed, then the world around it, then light, then the photographic treatment.
Lighting is the lever beginners ignore #
If I could only add one descriptor to a weak image, it would be lighting. “Golden hour backlight,” “harsh midday sun,” “single softbox from camera left,” “moody chiaroscuro,” “neon rim light” — each completely changes the emotional read. Compare:
portrait of an old fisherman, weathered face --ar 4:5 --v 6.1
portrait of an old fisherman, weathered face, deeply lined skin,
dramatic side lighting from a small window, deep shadows,
Rembrandt lighting, dark muted background, 85mm, f1.8 --ar 4:5 --v 6.1
The second reads like a gallery portrait. The only meaningful change is that the light now has a direction and a name.
Each model speaks a different dialect #
This trips up almost everyone. The same prompt does not translate cleanly across tools because they were trained differently and they parse syntax differently.
Midjourney
Comma-separated phrases plus parameters. It rewards strong style words and punishes overlong rambling prompts. Key parameters: --ar (aspect ratio), --stylize or --s (0–1000, how hard it applies its house aesthetic), --chaos or --c (0–100, variety across the grid), --weird (0–3000, off-kilter results), and --style raw to dial back the default Midjourney “look.” Use --v 6.1 for the current model, or --niji 6 for anime.
1970s sci-fi paperback cover, lone astronaut on a red dune,
two moons, retro-futurism, painted illustration, grainy print
texture, bold orange and rust palette --ar 2:3 --stylize 250 --v 6.1
Stable Diffusion
More technical, more controllable. You get explicit weighting with parentheses — (emphasis:1.3) strengthens a term, (term:0.7) weakens it — plus a separate negative prompt, a CFG scale (how strictly it follows the prompt, usually 4–9), step count, and a sampler. A typical SDXL setup:
Positive: cinematic photo of a snow leopard on a rocky ridge,
(golden hour:1.2), telephoto compression, sharp fur detail,
shallow depth of field, national geographic style, 8k
Negative: blurry, low quality, deformed, extra limbs, watermark,
text, oversaturated, cartoon
Steps: 30 | CFG: 6.5 | Sampler: DPM++ 2M Karras
DALL·E 3
Plain natural language. No parameter flags, no weighting syntax — it does best with full descriptive sentences and actually rewards you for explaining intent. It is the strongest of the three at rendering legible text in an image.
A cozy isometric illustration of a tiny bookshop interior, warm
lamplight, overflowing wooden shelves, a sleeping orange cat on
the counter, soft muted colors, gentle storybook style. The
hanging sign above the door reads "Margin Notes" in hand-lettered
serif type.
If you want a deeper breakdown per tool, the dedicated guides on Midjourney prompts, Stable Diffusion prompts, and DALL·E 3 prompts go further than I can here.
Negative prompts: say what you do not want #
In Stable Diffusion (and Flux to a lesser degree), the negative prompt is half the craft. It is where you exile the artifacts: extra fingers, fused limbs, blurry, jpeg artifacts, watermark, text, bad anatomy. Midjourney has no separate field, but --no does similar work — --no text, logo keeps stray lettering out. Do not stuff the negative with hundreds of tokens; a focused list works better than a wall. The full breakdown lives in our negative prompts guide.
Match the aspect ratio to the subject #
This is a small setting with an outsized effect, and beginners leave it on the default. The aspect ratio does not just crop — it tells the model what kind of composition to plan. A wide frame invites horizon and breathing room; a tall frame invites a standing figure. Set it before anything else:
--ar 16:9or3:2— landscapes, cinematic scenes, banners.--ar 4:5or2:3— portraits and full-body characters.--ar 1:1— avatars, icons, album-cover layouts.
Generate a sweeping vista at 1:1 and you have asked the model to cram a wide idea into a box. It will, and it will look cramped. The fix costs nothing — you just have to remember to set it.
Reading reference images: borrow, do not copy #
Once words stop getting you closer, bring in an image. Midjourney accepts an image URL at the front of a prompt and blends its look into the result; you can also use --sref for style reference and --cref to carry a character’s identity across a set. Stable Diffusion goes further with ControlNet, which can lock pose, depth, or edges from a control image. The point is not to clone the reference — it is to give the model an anchor for something language cannot pin down, like a precise face or a specific lighting setup. Use a reference to fix what your words keep missing, then keep describing everything else.
Iterate like an editor, not a gambler #
The biggest mistake is re-rolling the same prompt hoping for luck. Instead, change one variable at a time. Lock everything, swap only the lighting. Then only the lens. Then only the palette. You learn what each word is worth, and you can reproduce a win. Use a fixed --seed in Midjourney or a fixed seed in SD when you want to isolate a single change.
One controlled change per generation teaches you more than fifty random re-rolls. Treat every image as an experiment with exactly one variable.
A repeatable workflow #
- Write the subject in one clear line.
- Add composition and environment.
- Add named lighting.
- Add a specific style or medium — not three competing ones.
- Add technical cues (lens, film, render engine).
- Set parameters and generate four.
- Pick the closest, then change ONE thing and regenerate.
Run that loop three or four times and you will land somewhere a single prompt never could. If you want a tighter mental model for slot order and weighting, read the companion piece on the AI art prompt formula when you are ready to systematize this.
Where to go next #
Once the structure clicks, prompting stops feeling like superstition. You will know why an image works and how to push it further. Pick a single subject today, run it through the six slots, and change one variable per generation. That habit alone will move you past most people who have used these tools for a year.
















Leave a Reply