Blog

Page Object Model Explained: What It Is, How It Works, and Why Your Tests Need It

I remember the first time a UI change broke 34 tests at once. The developer had renamed one input field — just the ID, nothing visible changed for users — and my entire login suite went red. Two hours of find-and-replace later, everything was green again. A week after that, a different field changed. Another hour of fixes.

That’s when someone on the team mentioned Page Object Model. I’d heard the term before but assumed it was one of those architecture buzzwords people throw around without actually using. It’s not. It’s a specific, practical pattern that solves that exact problem — and once you’ve set it up properly, those two-hour locator-hunting sessions just stop happening.

This guide is what I wish I’d read before building my first automation suite without it.

The Maintenance Trap No One Warns You About

Here’s how test automation usually starts: you write a handful of tests, they all pass, you feel great about it. Locators go right into the test methods because that’s the most direct approach. Everything works.

Fast forward three months. You’ve got forty test files. Six of them touch the login page. All six contain something like:

driver.find_element(By.ID, ‘usr’).send_keys(username)

driver.find_element(By.ID, ‘pwd’).send_keys(password)

driver.find_element(By.ID, ‘loginBtn’).click()

Then the dev team updates the login form. ‘usr’ becomes ‘username’. ‘loginBtn’ becomes ‘login-submit’. Three locators changed. Six files are now broken. You open each one, manually find the locators, update them. It takes two hours. Nothing about this is technically difficult — it’s just tedious, error-prone work that shouldn’t exist.

This is the maintenance trap. And it’s not about being sloppy — it happens to careful people too, because the structure itself is the problem. When your tests own the locators, every UI change is everyone’s problem.

Software

Figure 1: The same locator change breaks 4+ files without POM. With POM, you update one file and every test just works.

So What Actually Is Page Object Model?

Page Object Model is a design pattern where each page (or significant section) of your application gets its own class. That class holds two things: the locators for every element you interact with on that page, and the methods for every action a user can take there.

That’s it. That’s the whole idea.

The rule that makes it work is this: your tests never locate elements directly. They never call driver.find_element. They call methods on page classes, which handle all the locating internally. The test knows what to do. The page object knows how to do it.

One sentence version: page objects own locators, tests own assertions. If those two things are in the same file, you don’t have POM — you just have classes.

So instead of the login test containing driver.find_element calls, it looks like this:

def test_valid_login(driver):

    home = LoginPage(driver).login(‘[email protected]’, ‘pass123’)

    assert home.get_welcome_message() == ‘Welcome, Alice’

Two lines. No locators. When the login form changes, LoginPage.py gets updated, and this test — along with every other test that calls login() — keeps passing without a single change.

The Four Layers and Why Each One Exists

POM frameworks that actually hold up over time tend to organize into four layers. You’ll find slight variations in how people name them, but the responsibilities are pretty consistent.

Software

Figure 2: Four layers, one responsibility each — this separation is what keeps POM maintainable as projects grow

Test Layer — the ‘what’

This is where your test scenarios live. What are we verifying? Valid login succeeds. Invalid password shows an error. Checkout with an empty cart redirects home. Test methods read like specs — they describe behavior, not mechanics.

A red flag at this layer: if you see driver.find_element, By.ID, or any locator string inside a test file, something has leaked from the page layer. Tests should be completely blind to how interactions happen. They just call page methods and make assertions.

Page Layer — the ‘how’

Page classes live here. One class per screen or major component. Each class stores its locators as class-level constants and exposes action methods. This is the layer that most people think of when they think of POM.

Something that trips people up early on: page objects shouldn’t make assertions. It’s tempting to add an assert inside a click method when you want to verify the click worked. Don’t. Return data instead. Let the test decide if that data is correct. Once you start putting assertions in page objects, you lose the ability to use the same action in different test contexts.

Base Layer — the shared toolkit

Every page class needs to find elements, wait for things to load, handle stale references when the DOM refreshes. Writing that logic in every page class is just a different version of the same duplication problem you started with.

The base layer — usually a BasePage class that everything else inherits from — centralizes all of that. One place for explicit waits. One place for scroll helpers. One place for screenshot-on-failure logic. When you find a bug in your wait implementation, you fix it once.

Driver Layer — session management

WebDriver or Appium gets set up here. Browser type, headless mode, device capabilities — all of it lives in a driver factory that the rest of the project just consumes. Tests get a driver instance handed to them; they don’t create one.

The practical value: if you need to add Firefox support or switch to headless Chrome for CI, you change one file. Nothing in your test suite or page objects needs to know or care.

Watching One Command Travel Through All Four Layers

The architecture makes more sense when you trace something concrete through it. Take this test:

def test_login_shows_dashboard(driver):

    home = LoginPage(driver).login(‘[email protected]’, ‘pass123’)

    assert ‘Dashboard’ in home.get_page_title()

That one line — LoginPage(driver).login(‘[email protected]’, ‘pass123’) — does quite a bit:

  1. The test layer calls the login() method on the LoginPage class, passing the driver and credentials
  2. The page layer (LoginPage.login) uses the stored USERNAME and PASSWORD locators to find the fields, types the credentials, finds the submit button, clicks it, waits for navigation, then returns a HomePage instance
  3. Each find and interaction goes through the base layer’s find_element helper, which wraps everything in a WebDriverWait — so there’s no manual sleep, no flakiness from timing
  4. The driver layer executes the actual HTTP commands to ChromeDriver, which instructs the browser to locate elements and perform actions
  5. The browser does the work, the response travels back up, and the test receives a HomePage object to assert against
Software

Figure 3: One login() call travels through all four layers before the browser does anything — each layer handles exactly its piece

What makes this clean isn’t the complexity — it’s actually simpler than it sounds. It’s that each layer handles one thing and nothing else. The test has no idea how login works mechanically. LoginPage has no idea what assertions the test will make. The base class has no idea what page it’s being used from. That separation is what makes the whole thing maintainable.

A Real LoginPage Implementation

Abstract patterns only stick when you see them in code. Here’s a LoginPage that does POM right:

from selenium.webdriver.common.by import By

from pages.base_page import BasePage

from pages.home_page import HomePage

class LoginPage(BasePage):

    URL      = ‘/login’

    USERNAME = (By.ID, ‘username’)

    PASSWORD = (By.ID, ‘password’)

    SUBMIT   = (By.CSS_SELECTOR, ‘button[type=”submit”]’)

    ERROR    = (By.CLASS_NAME, ‘login-error’)

    def login(self, username, password):

        self.find(self.USERNAME).send_keys(username)

        self.find(self.PASSWORD).send_keys(password)

        self.find(self.SUBMIT).click()

        return HomePage(self.driver)

    def login_expecting_failure(self, username, password):

        self.find(self.USERNAME).send_keys(username)

        self.find(self.PASSWORD).send_keys(password)

        self.find(self.SUBMIT).click()

        return self

    def error_message(self):

        return self.find(self.ERROR).text

A few things worth pointing out. The locators sit at class level — not buried inside methods where they’re harder to find and update. There are two login methods, not one with a boolean flag. login() assumes success and returns the next page. login_expecting_failure() stays on the login page. This makes test code explicit: when you call login_expecting_failure(), anyone reading the test immediately knows an error is expected.

And error_message() just returns text. It does not assert anything. That’s the test’s job:

def test_wrong_password_shows_error(driver):

    page = LoginPage(driver).login_expecting_failure(‘alice’, ‘wrongpass’)

    assert ‘Invalid credentials’ in page.error_message()

Mistakes That Seem Fine Until They’re Not

These patterns are common enough that they’re worth naming directly. None of them are obvious mistakes — they feel like reasonable shortcuts when you make them.

Assertions inside page objects

You write a submit() method and add assert ‘Success’ in driver.title because it makes sense in the moment. Now that method can only be used in tests that expect success. You want to test a validation error? You’ll either catch the assertion or duplicate the method. Keep assertions in tests — even when it feels verbose.

Using driver.find_element inside test files

This one leaks into test files gradually. One locator here, one there, ‘just this once.’ Six months later you have a hybrid mess where some interactions are in page objects and some are directly in tests. Pick one and stick to it. All locators go in page classes, no exceptions.

One enormous page class for the whole application

An ApplicationPage class with 80 methods covering every screen isn’t POM — it’s a different shape of the same problem. If a page class is getting unwieldy, it’s usually because it’s representing more than one page or component. Break it up.

time.sleep() in page methods

Hard sleeps make tests slow and still flaky. They say ‘I think three seconds is enough’ — which is sometimes wrong in both directions. Replace every sleep with an explicit WebDriverWait condition in your base class. Wait for the element to be present, visible, or clickable. That’s more reliable than any fixed delay.

Returning True or False from actions

A click_submit() method that returns True on success and False on failure puts decision logic inside the page object. Now every test has to if/else the return value. Return the next page, or return self, or raise an exception — but don’t return boolean status flags.

How the Folder Structure Should Look

The code structure matters as much as the pattern itself. A POM project with a flat directory where everything lives in one folder loses half its readability benefit. Here’s a structure that stays navigable at scale:

project/

├── pages/

│   ├── base_page.py

│   ├── login_page.py

│   ├── home_page.py

│   ├── product_page.py

│   └── checkout_page.py

├── tests/

│   ├── test_login.py

│   ├── test_checkout.py

│   └── test_search.py

├── config/

│   └── driver_factory.py

├── resources/

│   └── test_data.json

└── conftest.py

The convention that helps most: one page file per screen. Not one file for ‘all the shopping pages’ — one for ProductPage, one for CartPage, one for CheckoutPage. When you’re looking for where the cart locators live, you go straight to cart_page.py without having to search.

conftest.py handles pytest fixtures — driver setup, teardown, any shared state. driver_factory.py in config/ is the only place a WebDriver instance gets created. Both of those centralizations pay off the first time you need to change something about how your driver initializes.

Honestly, When Is POM Worth It?

I’ve seen people add POM to a five-test suite for a project that ran once and was never touched again. That’s probably overkill. The pattern has overhead — you’re creating more files, more classes, more indirection. For small, throwaway test collections, direct driver calls are fine.

But for anything with a longer shelf life, POM starts paying back fast. Some honest markers that you’ve crossed the line where you need it:

  • You’ve had to fix the same locator in more than two places. Once this has happened, the next time it happens takes just as long.
  • More than one person writes tests. Without POM, different people invent different ways to interact with the same pages. Reviewing their code becomes exhausting.
  • The application is actively developed. A UI that changes often will punish you for every duplicate locator in your test suite.
  • You need to run the same tests against multiple environments or browsers. Centralizing driver setup makes this a config change, not a code change.

And even if none of those apply today, starting with the POM structure costs maybe an extra hour upfront. Projects have a way of growing beyond their original scope. Starting with good structure means you’re not refactoring 200 tests later.

Questions People Actually Ask About POM

Does POM work the same way with Appium for mobile testing?

Yes, and pretty seamlessly. The driver layer changes — you pass an Appium driver instead of a Selenium WebDriver — but the page class structure, base class, and test layer all look identical. Some teams share a BasePage between their web and mobile suites, swapping only the driver type. The locators are obviously different, but the pattern is the same.

Should I create a page class for every single page, or just the important ones?

Start with the pages your tests actually cover, not every page in the application. There’s no point creating a ForgotPasswordPage class if no test touches it yet. Add page classes as you write tests. Trying to model the whole application upfront leads to a lot of unused code that still needs maintaining.

My page objects are getting huge. Is that normal?

It usually means one of two things: either the page itself is genuinely complex and needs to be split into components (NavigationBar, SearchPanel, ResultsList as separate classes), or methods have accumulated that belong in the base class rather than specific pages. Check both before just accepting the large class.

Where does test data like credentials and URLs go?

Not in page objects, and not hardcoded in test methods if you can avoid it. Credentials, test user data, and environment URLs belong in fixtures or external config files that tests pull from. This makes running the same tests against staging vs production a matter of swapping a config value, not editing test files.

Can I use POM with pytest? Do I need any special libraries?

No special libraries needed. pytest works naturally with POM — you use fixtures (conftest.py) to provide the driver, and your page classes are just regular Python classes. The pattern is framework-agnostic. It works equally well with unittest or any other test runner.

Final Thought

The teams I’ve seen struggle most with test automation maintenance almost always have the same underlying issue: test code is doing too many jobs at once. It’s figuring out where things are on the page, deciding how to interact with them, setting up state, and verifying results — all in the same function.

POM doesn’t add complexity. It removes a particular kind of complexity by giving each concern its own home. Locators go in page classes. Interaction logic goes in page methods. Assertions go in test methods. Driver setup goes in fixtures.

Once that separation is in place, the test suite starts to feel like a codebase rather than a collection of scripts. Changes are local. Failures are diagnosable. New tests are fast to write because the infrastructure already exists.

It’s worth setting up properly from the start. The hour you invest in the structure early is the hour you don’t spend debugging a 34-test cascade failure six months from now.