Blog

UI/UX & Visual Testing

Section 1 — What UI and UX Actually Mean

Let me be upfront: I spent about a year using UI and UX as if they meant the same thing. Most people do. You hear the phrase thrown around in stand-ups, in job descriptions, in client briefs — and it starts to blur into one fuzzy concept called ‘design’. But the moment you sit down to actually test a product, or hand off a spec to a developer, the difference starts to matter a lot.

UI — The Part You Can See and Touch

UI stands for User Interface. That sounds official, but it really just means: everything you interact with on screen. The sign-in button. The colour of the error message. The way a card component casts a shadow. The amount of white space separating two blocks of text. All of it — that is UI.

Designers spend real time arguing over whether a button should be 36px or 40px tall. And honestly? That argument matters. A button that is too small gets missed on mobile. One that is too wide looks clunky on desktop. These details are the substance of UI work, not decoration.

Figure 1 — UI handles what you see; UX handles how the whole thing feels

UX — The Journey, Not the Destination

UX is trickier to pin down because you can’t point at it on screen. User Experience is the feeling someone gets as they move through your product. Can they find what they’re looking for without backtracking? Does the confirmation email arrive fast enough that they don’t think the form broke? Does the error message actually tell them what went wrong, or does it just say ‘something went wrong — try again’?

 Here’s a useful way to think about it: UI is the look of a restaurant — lighting, table settings, menu design. UX is whether you got a table quickly, whether the waiter knew the specials, and whether you’d go back. Both matter. Neither is more important.

Where People Get Confused

The honest answer is: the roles blur in practice. At small companies, one person often owns both. At larger ones, a UX researcher might spend weeks doing interviews and usability tests and never touch a design file. A UI designer might obsess over component libraries without ever watching a real user struggle with a flow.

Here’s a rough side-by-side that might help:

UI (User Interface)UX (User Experience)
Buttons, icons, colour, typographyFlows, navigation, usability
‘Does it look right?’‘Does it work intuitively?’
Figma frames and component librariesUser research, journey maps, testing
Visible on screenFelt across the whole session

Why This Actually Affects Product Quality

Bad UI makes people distrust your product before they’ve even used it. Bad UX makes them leave after they have. Both problems cost money — just at different points in the funnel. Early investment in solid UI/UX practice pays off in ways that are genuinely measurable:

–         Support tickets drop when the interface explains itself clearly.

–         Conversion rates climb when sign-up flows remove unnecessary friction.

–         Users stay longer — and return more — when the experience doesn’t exhaust them.

–         Accessibility improvements, which are a UX concern, often improve SEO as a side effect.

None of this is abstract theory. It shows up in A/B tests, in churn numbers, in app-store reviews. The teams who treat UI/UX as core engineering — not a final polish layer — tend to ship products people actually recommend.

Section 2 — Comparing Figma Designs to the Real App

Here’s something that surprised me when I first started working on design quality: even when a developer follows the Figma spec exactly, the live app rarely looks identical to the mockup. Not because anyone made a mistake — just because screens are complicated.

Why the Gap Exists

Figma renders everything in a controlled environment at a fixed resolution. Real devices don’t work that way. Fonts get hinted differently by the OS. Shadows that look crisp at 2x display density look blurry on a 1x screen. Safe-area insets on notched phones eat into layouts that were designed without them. And then there’s the content problem — your mockup had a nice short username, but real users have 47-character display names that break your carefully spaced header.

The goal of design-to-app comparison isn’t perfection. It’s catching the differences that a user would actually notice — the misaligned input field, the button that’s lost its border radius, the heading that switched to a system fallback font. 

Types of Mismatch Worth Tracking

Spacing and Alignment

This is usually the most common category. Margins that should be 16px end up at 12px. A form sits 8px too far to the left. An icon doesn’t quite line up with the label next to it. Individually small; collectively they make the app feel ‘off’ in a way users can sense even if they can’t articulate why.

Typography Drift

Font weights are especially sneaky. ‘Medium’ (500) and ‘Regular’ (400) look almost identical on a Figma canvas but noticeably different in a browser or on a physical device. Line-height mismatches cause text to run into adjacent elements. Missing font fallbacks can silently swap your brand typeface for a system serif.

Colour Inconsistency

RGB values can shift between colour spaces — what’s #2563EB in sRGB can look subtly different when rendered by a display in a different colour profile. More commonly, a developer types a hex code from memory and gets it slightly wrong. These things slip past code review.

Responsive Breakage

The design was built at 375px wide. Someone tested on a 390px device. But then it gets opened on a tablet in split-screen mode at 280px, and two buttons overlap. Responsive issues are hard to catch by hand — they almost need automated coverage across viewport sizes.

Figure 2 — The five stages of a design-to-app comparison workflow

How to Actually Run a Comparison

The mechanics are straightforward once you’ve done it a couple of times:

1.       Export the Figma frame as a PNG at the same resolution as your target device. For most mobile work that means 390 × 844.

2.       Take a screenshot from the running app — same device, same viewport, same state (logged in, with real data if possible).

3.       Pre-process: strip the status bar from both images, crop to identical bounds. Even a 1px difference in canvas size will throw off the comparison.

4.       Run SSIM — Structural Similarity Index Measure. It gives you a score between 0 and 1. Roughly anything above 0.85 is passing; below 0.80 usually means there’s a real problem visible to the naked eye.

5.       Generate a diff overlay. Colour the regions of greatest divergence — red bounding boxes work well — and save the annotated image alongside the score.

One practical tip: mask any area that contains dynamic content before comparing. Profile photos, notification badges, timestamps — these will always differ between the design and a live screenshot, and they’ll drown out the signal you actually care about. 

Where This Process Gets Tricky

Font rendering is the most persistent headache. macOS, Windows, and Android all anti-alias text differently, so even a perfect implementation will generate a non-zero SSIM diff in text-heavy regions. The practical fix is to run comparisons on a fixed device or emulator — one consistent rendering environment — rather than comparing across platforms.

Animation states are another trap. If your screenshot is captured mid-transition, the comparison is meaningless. You either need to wait for the UI to settle, or exclude animated components from visual regression tests and test them separately with interaction tests.

Section 3 — Automated UI Testing in Real Projects

Visual comparison by hand doesn’t scale. Once a codebase grows past a few dozen screens, manually checking every view after every release is practically impossible. That’s where automated UI testing comes in — and it’s more accessible than most teams realise.

What You’re Actually Testing

Before reaching for tools, it helps to be specific about what you want to catch. The most valuable checks in my experience are:

–         Element positions — has a component shifted relative to its neighbours?

–         Spacing consistency — are gutters and padding uniform across similar components?

–         Colour accuracy — do computed background and text colours match the design tokens?

–         Font rendering — correct family, weight, and size?

–         Responsive behaviour — does the layout hold at the three or four breakpoints you care about?

That list maps to a combination of visual regression tests and DOM-inspection tests. You don’t need to do everything at once — start with the screens users hit most, then expand coverage as confidence builds.

Figure 3 — Core tools used in an automated visual testing stack

The Tool Stack That Works

OpenCV

OpenCV is a computer-vision library that handles the heavy lifting of image operations — resizing, colour-space conversion, contour detection, and pixel-level diffing. In a UI testing context, you use it to pre-process screenshots and generate the difference masks that highlight mismatches. It’s fast and runs anywhere Python does.

SSIM

Structural Similarity Index Measure is an algorithm that compares images the way humans perceive them — considering luminance, contrast, and structure together rather than just counting different pixels. A raw pixel diff will scream about anti-aliased text even when nothing is wrong. SSIM is much less noise-prone, which means fewer false positives and faster triage.

Java

Java is Chromium-based browser automation that has become the standard for web UI testing. Beyond driving user flows, it lets you capture full-page screenshots, emulate specific viewport sizes, and simulate reduced-motion preferences. The screenshots feed directly into your SSIM pipeline.

Appium

For native mobile — iOS and Android — Appium provides the same kind of programmatic control that Playwright gives you on the web. You can navigate to any screen, trigger any state, and capture a screenshot, all from a single test script. It’s verbose to set up, but once it’s running, coverage is comprehensive.

Thinking Beyond Pixel Comparison

Pixel comparison catches visual regressions. But it doesn’t tell you whether the component hierarchy is correct — whether a card that looks fine actually has the right DOM structure, or whether a navigation element is keyboard-accessible. For that, you layer in:

–         Accessibility audits (axe-core integrated with Playwright) — catches missing ARIA labels, low contrast, and focus-order issues.

–         Snapshot testing — serialises the rendered component tree and diffs it against a stored snapshot. Great for catching accidental markup changes.

–         Visual AI tools — newer tooling uses ML models to detect UI components semantically, so a button that moved three pixels doesn’t fail the test, but a button that disappeared entirely does.

 The teams I’ve seen do this well treat visual tests the same as unit tests — they live in the same repository, run in the same pipeline, and failures block a merge. That culture shift matters more than any specific tool choice.

Wrapping Up

UI and UX are not buzzwords. They’re specific disciplines with specific outputs — and confusing them leads to products that look polished but frustrate users, or vice versa. The goal is always both: something that looks intentional and works effortlessly.

Getting from a Figma mockup to a live application without losing design quality requires deliberate process. Automated visual comparison — done right, with sensible thresholds and real device coverage — closes that gap systematically rather than hoping QA will catch it by eye.

None of this requires a massive team or expensive tools. A Python script using OpenCV and SSIM, a Playwright setup hitting your staging environment, and a clear threshold for what counts as a pass — that’s enough to transform visual quality from luck into process.

Solid UI + Thoughtful UX + Consistent Testing = Products People Trust