The 80-Message UX Iteration: Where AI Helps and Where It Falls Short

The search engine took twenty minutes. I am not exaggerating. Twenty minutes from "we need semantic search" to a working implementation using sentence transformers, all-MiniLM-L6-v2 via ONNX, returning genuinely intelligent results. You type "how do I handle a complaint" and it finds the grievance procedure document even though the word "complaint" doesn't appear anywhere in it. Proper semantic understanding, running locally, no API calls.

Then I spent eighty messages trying to make the search results page not look terrible.

This post is about that ratio. Because I think it tells you something fundamental about where AI is transformative and where it's just... adequate.

The Technical Bit (The Easy Part)

The project was a searchable directory for a client. Hundreds of entries, each with descriptions, categories, delivery methods, and tags. The existing search was keyword-based and awful. Search for "beginner running" and you'd only find results that contained those exact words. Anything filed under "couch to 5k" or "jogging for beginners" was invisible.

Semantic search fixes this. Instead of matching keywords, it matches meaning. The sentence transformer model converts text into vectors - numerical representations of meaning - and then you compare vectors to find similar content. Two phrases can have completely different words but nearly identical vectors if they mean similar things.

Claude built the whole pipeline: ONNX runtime integration, vector generation for the directory, cosine similarity search, result ranking. It even handled the tokenisation edge cases and the batch processing for generating embeddings across the full directory. Working, tested, deployed in a single session. This is AI at its best - technically complex work executed cleanly.

"Make the Debounce About 200ms"

Then the design work started. And it didn't stop for eighty messages.

The first thing was the search input itself. It needed to feel responsive - search-as-you-type, no submit button, results appearing as you type. But not too responsive, because hitting the semantic search on every keystroke would hammer the server.

"Make the debounce about 200ms."

Simple request. Claude implemented it. Then the results flickered on every keystroke because the loading state wasn't handled properly. Then the loading state caused a layout shift because the results container resized. Then the resize was fixed but the animation felt jerky. Four messages for a debounce timer. This is the reality of UI work.

The Tag Situation

Each entry had tags - categories like "Beginner", "Family Friendly", "Outdoors", "Weekends". These needed to display as small pills beneath each search result. Should be simple.

"The tags should be smaller. Pill-shaped. As small as we can get, really, as some will have aloooooads."

First attempt: tags the size of buttons. Second attempt: tags that were small but had so much padding they wrapped to three lines. Third attempt: tags that were genuinely small but had truncated text so you couldn't read them. Fourth attempt: tags that looked right on desktop but overlapped on mobile.

We eventually landed on a design that worked. Small, compact pills with a subtle background colour per category, wrapping neatly when there were many, with a "+ 5 more" overflow when there were too many to display. It looks effortless. It was not effortless. It was six or seven rounds of iteration on what is fundamentally a list of small coloured rectangles.

"No, It's Crap"

Somewhere around message forty, I lost patience. The search results page had gone through so many iterations that it had accumulated contradictory styling. The card layout was a Frankenstein of three different design directions. The sidebar filters had been added, removed, re-added in a different position, and were now floating in a way that worked on one screen size and collapsed into nonsense on every other.

"No, it's crap. It's a total mess. Why can't you get this right?"

I'll be honest with you - this wasn't entirely fair. Each individual change Claude made was reasonable. The problem was that forty incremental changes, each responding to a specific piece of feedback, had created a design that no one had ever looked at holistically. It's the design equivalent of technical debt: each commit is fine, but the cumulative effect is chaos.

This is a pattern I've noticed repeatedly with AI-driven UI work. AI is excellent at responding to specific feedback and terrible at maintaining a coherent design vision across many iterations. It doesn't step back and say "we've made thirty changes and the overall design has drifted - should we start this section fresh?" It just keeps applying patches to patches.

When I Grabbed the CSS Myself

After the "it's crap" message, I did something I hadn't done in weeks: I opened the CSS file and made changes myself. Not because Claude couldn't make CSS changes - it obviously could - but because I needed to see the layout with my own eyes and adjust it with direct visual feedback.

I spent about thirty minutes restructuring the results layout. Fixed the card grid, sorted the spacing, got the sidebar sitting where it should be. Then I handed it back to Claude and said "review what I've done and refine it."

This worked beautifully. Claude cleaned up my slightly rough CSS, added the responsive breakpoints I'd been too lazy to think about, and handled the edge cases - empty states, loading states, error states. The combination of human visual judgement and AI thoroughness produced a better result than either could have alone.

The lesson: sometimes the fastest path through a design problem is for the human to sketch the layout and the AI to finish it. Not the other way round.

Four Context Resets

The conversation hit context limits four times. Four times. For a search results page.

Each time, Claude lost the thread of what we'd already tried, what worked, and what didn't. I'd have to re-explain the design direction, re-establish the constraints, and watch it briefly attempt approaches we'd already abandoned. By the fourth reset, I had a bullet-point summary ready to paste at the start of each new context: "Here's what we've decided. Here's what works. Don't change these things. Focus on this specific problem."

Context limits are a practical constraint that nobody talks about honestly. Yes, the context windows are getting bigger. Yes, Claude Code handles long conversations better than the chat interface. But when you're doing intensive design iteration - where every message builds on the previous one and the full history of decisions matters - hitting the limit feels like your collaborator has sudden amnesia. You're not starting from zero, but you're spending fifteen minutes rebuilding context that shouldn't have been lost.

The Feature Creep Within the Conversation

Across eighty messages, the scope expanded. What started as "search results page" became search results with autocomplete suggestions, category filtering, delivery method toggles, a sticky sidebar, card styling with hover effects, tag displays, result counts, pagination, and a "no results" state with suggested alternative searches.

Each addition was individually justified. Autocomplete makes search better. Category filters are expected. The sticky sidebar improves usability on long result lists. But the cumulative effect was that a single conversation was trying to design and implement an entire search experience, and the conversation format isn't built for that. It should have been five focused conversations, not one sprawling one.

I know this now. I didn't know it at message twenty, when adding "just one more thing" still felt manageable.

The Scorecard

Here's my honest assessment of how AI performed across the different parts of this project.

Semantic search implementation: Exceptional. Twenty minutes. Clean code. Correct approach. I wouldn't have known where to start with ONNX runtime integration and sentence transformer tokenisation. Claude did. This alone justified the entire project.

Search input and debounce behaviour: Good. Needed a few iterations but converged quickly. Functional UI components with clear specifications are AI's sweet spot.

Card layout and result display: Adequate. Got there eventually but required significant iteration. The starting point was always generic, and refining it to feel polished took effort.

Tag and filter design: Frustrating. Many rounds of back-and-forth for something that should be straightforward. AI struggles with "make it as small as possible but still readable" because that's a visual judgement, not a specification.

Overall page layout and design coherence: Poor until I intervened directly. Forty incremental changes had created a mess, and only human eyes and hands could sort it out. Claude was excellent at refining my fix but couldn't have arrived there through iteration alone.

Responsive design: Mixed. Individual breakpoints worked. But the interaction between multiple responsive elements - sidebar collapsing while cards reflow while filters stack - required holistic thinking that AI doesn't naturally do.

What This Means for How I Work

The 80-message conversation taught me something I keep coming back to: AI's strengths and weaknesses are not what most people think.

The common narrative is that AI is great at simple tasks and struggles with complex ones. In my experience, it's the opposite. AI built a semantic search engine using sentence transformers and ONNX runtime without breaking a sweat. That's genuinely complex engineering. Then it spent eighty messages failing to make tags look nice. That's genuinely simple design.

The difference isn't complexity. It's whether the task has a verifiable correct answer. Semantic search either returns relevant results or it doesn't. You can test it. CSS either looks right or it doesn't, and "looks right" is a human judgement that no test can capture.

I've adjusted my workflow accordingly. For anything with a clear specification - algorithms, data processing, API integration, business logic - I trust AI to get it right in one or two passes. For anything visual - layout, spacing, typography, the overall feel of a page - I budget for heavy iteration and plan to get my hands dirty when the iteration stalls.

The search page is brilliant now, by the way. Users love it. The semantic matching surfaces results they'd never have found with keyword search, and the interface is clean and fast. Eighty messages was a lot. But the result was worth it. I just wish someone had warned me that the hard part wouldn't be the sentence transformers.