Make Critique Spatial.

Most AI “design feedback” is astrology: flattering, non-falsifiable, and strangely confident.

It tells you the interface “could improve hierarchy” the way a horoscope tells you to “guard your energy.” True enough to pass, useless enough to ignore.

Real critique has coordinates.

Critique must point

If feedback can’t answer where, it can’t answer what to do next.

Design critique is not a mood. It’s a diagnosis: the violated standard, the gap, and the smallest correction that closes it. Anything else is commentary.

This is why the most valuable shift in AI-assisted design work isn’t “better generation,” it’s better critique loops—especially now that multimodal models can look at screenshots and pretend they’re being helpful.

Pretend isn’t enough.

A loop that stays honest

Here’s a short loop that reliably produces usable output:

Cap the critique at 3 items. More than that is pedantry dressed as rigor.
Force the triad: expected standard → gap → smallest fix.
Require a region. A bounding box, a selector, a component name—anything that pins the critique to a place.
Label severity. Cosmetic / confusing / blocking. If it’s cosmetic, say so.
Ban research cop-outs. “Test with users” is not critique. It’s a way to avoid making a call.
Prefer edits over rewrites. Change one thing, not ten. Rerun the loop.
Capture non-visual requirements. Accessibility, interaction behavior, state logic. Put intent where engineers (and agents) can actually see it.

Two prompts to keep the model from becoming a poet

1) The grounded critique.

You are a strict design critic. Do not give generic advice.
 
Given: a UI screenshot (or a link) and the intended user task.
Return exactly 3 critiques.
 
Each critique must include:
1) Region: describe where (component name + location, or coordinates if available)
2) Expected standard: one sentence
3) Gap: one sentence
4) Smallest fix: one sentence
5) Severity: cosmetic / confusing / blocking
 
Constraints:
- No “test with users” advice.
- No replatforming suggestions.
- No rewriting the whole layout.

2) The surgical patch list.

Convert the critiques into a patch list.
 
Rules:
- Max 7 patches.
- Each patch is one sentence starting with a verb.
- Order by impact (highest first).
- Each patch must be local (touch one component/region).

Why this works

The triad is doing something quietly brutal: it forces commitment.

“Improve hierarchy” is not a commitment. “Increase label contrast to meet readability; reduce competing emphasis in the card body; keep the primary button as the only high-saturation element” is.

And pinning critique to a region prevents the model from doing its favorite magic trick: producing advice that would apply to any interface on the internet.

If your feedback still reads like it could have been copy-pasted onto a random screen, you don’t need a smarter model. You need stricter questions.

If the critique can’t point, it can’t help.