Automation
AI
Test Automation
Building a Bulletproof Automation Framework: Field Notes from an Architect

Building a Bulletproof Automation Framework: Field Notes from an Architect

December 10, 2025 7 min read
🎯

Field Notes Summary

  • The Struggles: Bot detection, visual blindness, legacy systems, Shadow DOM, ugly reports
  • The Tools: sb-stealth-wrapper, Visual Guard, Visual Sonar, Lumos ShadowDOM, pytest-glow-report
  • The Blueprint: Layered architecture (Test β†’ Business β†’ Core β†’ Infrastructure)
  • The Laws: Config, Components, Waits, API-first, Observability

I've spent countless hours refactoring 'perfect' code that simply stopped working. In this deep dive, I'm sharing the mistakes, the struggles, and the architecture that emerged from the chaos.

This isn't a theoretical exercise. Every pattern here was born from a real problem I faced over the years. These are some lessons I learned along the wayβ€”feel free to use them if you find them useful.

The Open-Source Philosophy
Yes, there are excellent commercial tools in the market that solve many of these problems brilliantly. But I believe you can build 80% of their functionality at almost zero cost to your project. Every tool I share here is free and open-source.
πŸ—οΈ

Part 1: The Struggles That Shaped This Framework

Before we talk about architecture, we have to talk about why standard Selenium/Appium scripts fail. They don't fail because the logic is wrong; they fail because the modern web is hostile, legacy systems are opaque, and UI rendering is unpredictable. Here is how I fought back.

πŸ”₯ Struggle #1: The "403 Forbidden" Nightmare

Every automation engineer knows this pain: your script works perfectly locally, then fails in CI with 403 Forbidden. You're not testing a secure bank vaultβ€”you're testing a simple landing page.

I spent months fighting bot detectionβ€”Cloudflare Turnstiles, 'Verify you are human' loops, and scripts that worked yesterday but fail today. The typical 'please-don't-ban-me' starter pack didn't cut it.

Code
# The OLD WAY - What everyone tries (and fails with)
options = webdriver.ChromeOptions()
options.add_argument("--headless") 
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("user-agent=Mozilla/5.0...")
# ... 20 more lines of hoping it works

# Result: Still blocked.

The Root Cause: Headless Chrome on Linux is a dead giveaway. It screams 'I am a bot' because it lacks a display server.

My Solution: I built sb-stealth-wrapperβ€”a library that wraps SeleniumBase's UC Mode with intelligent defaults. On Linux, it automatically spawns a virtual display (Xvfb), so Chrome *thinks* it has a screen. Combined with heuristic clicking (scroll β†’ hover β†’ click), it bypasses most bot detection.

Code
# The NEW WAY - 4 lines that just work
from sb_stealth_wrapper import StealthBot

with StealthBot(headless=True) as bot:
    bot.safe_get("https://nowsecure.nl")
    bot.smart_click("#verify")  # Human-like interaction

Benchmark Results: Standard Selenium ❌ FAIL | Playwright ❌ FAIL | StealthBot βœ… PASS (see CI run)

πŸ“¦ PyPI: pypi.org/project/sb-stealth-wrapper

πŸ”₯ Struggle #2: "The Test Passed, But the UI is Broken"

We've all been there: The test suite passes 100%, but when you open the application, the submit button is overlapping the footer. This is Visual Blindnessβ€”the inability of standard functional scripts to see the application like a user does.

A button can be 1px wide, transparent, or white-on-white, and Selenium will happily click it and pass the test. Some commercial tools solve this brilliantly with AI-powered visual testingβ€”but they come with enterprise pricing.

The Silent Failures
A test that passes functionally but looks broken to the user is a failed test.

My Solution: I built Visual Guardβ€”a free, open-source Python library that delivers ~80% of the value at zero cost. It follows a strict workflow: Snapshot β†’ Compare β†’ Report. With SSIM algorithms, it ignores minor rendering differences but catches real layout regressions.

Code
from visual_guard import VisualTester

tester = VisualTester()
tester.assert_visual(
    driver, 
    screenshot_name="homepage_v1", 
    mask=[(100, 100, 200, 50)]  # Mask out dynamic ads
)

Threshold Note: Visual Guard ships with SSIM threshold of 0.92. Anything below this triggers a failure, catching real regressions while ignoring minor anti-aliasing differences.

πŸ“¦ PyPI: pypi.org/project/visual-guard

πŸ”₯ Struggle #3: The "Unautomatable" Legacy Systems

How do you automate a legacy Windows application running inside Windows Virtual Desktop (WVD) or Citrix? You fire up Selenium... and realize there's no DOM. You try UiPath or Blue Prism... and hit a $$$$ licensing wall. You consider AutoIt... and discover the coordinates break when someone changes their monitor.

My Solution: I built Visual Sonarβ€”a free, open-source tool that borrows from nature. Like bats using echolocation, it presses TAB, detects where pixels changed (focus ring), and maps form field coordinates dynamically.

Code
1. Take screenshot (BEFORE)
2. Press TAB key
3. Take screenshot (AFTER)
4. Diff the images β†’ Changed region = FIELD

No DOM needed. No expensive licenses. Just Python + OpenCV + PyAutoGUI.

Caveat
TAB-order mapping works for standard Windows forms but may fail on owner-drawn or custom-focus applications (Java Swing, Delphi, legacy ActiveX). For those, you may need image template matching as a fallback.

πŸ“¦ PyPI: pypi.org/project/visual-sonar

πŸ”₯ Struggle #4: The Shadow DOM Black Hole

Modern web frameworks like Salesforce Lightning, Angular Material, and Polymer use Shadow DOM to encapsulate components. The problem? Standard Selenium commands can't see inside shadow roots. Your `find_element` calls return nothing, even though the element is right there in the browser.

To access an element, you have to find the host, get the shadow root via JavaScript, and search within. If you have nested shadow roots (common in enterprise apps), you repeat this nightmare recursively.

My Solution: I built Lumos ShadowDOMβ€”a package that extends Selenium WebDriver with a simple `find_shadow()` method. It handles all the JavaScript execution and recursive traversal for you.

Code
from selenium import webdriver
import lumos_shadowdom  # Activates the extension

driver = webdriver.Chrome()
driver.get("https://example.com/shadow-dom-app")

# Instead of 15 lines of JS execution:
driver.find_shadow("my-app > settings-panel > #save-btn").click()

πŸ“¦ PyPI: pypi.org/project/lumos-shadowdom

πŸ”₯ Struggle #5: Reports Nobody Reads

Traditional test output is designed for developers, not for the entire team. Product managers want a quick status. QA leads want trends. Executives want a go/no-go decision for release.

I was tired of ugly developer tools. Life's too short for Comic Sans and #FF0000 error text.

My Solution: I built pytest-glow-reportβ€”a plugin that generates beautiful HTML reports with Executive Summary, Risk Level scoring, and visual dashboards. Zero configuration.

Code
pip install pytest-glow-report
pytest --glow-report

πŸ“¦ PyPI: pypi.org/project/pytest-glow-report

πŸ“

Part 2: The Blueprint β€” Layered Architecture

Now that we've addressed the specific pain points, let's talk about how to organize all these pieces into a cohesive system. After years of iteration, here's the architecture I recommend:

Code
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  TEST LAYER                      β”‚
β”‚        (Test Cases, Test Data, Assertions)       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                BUSINESS LAYER                    β”‚
β”‚     (Page Objects, API Services, Keywords)       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                  CORE LAYER                      β”‚
β”‚   (Driver Management, Config, Utilities, Waits) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚               INFRASTRUCTURE                     β”‚
β”‚    (CI/CD, Reporting, Logging, Environment)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Magic: When the UI changes, only the Page Objects change. When the API changes, only the Service layer changes. Tests remain stable.

Recommended Project Structure

Code
automation-framework/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ settings.py
β”‚   └── environments/
β”‚       β”œβ”€β”€ dev.env
β”‚       β”œβ”€β”€ staging.env
β”‚       └── prod.env
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ driver_factory.py
β”‚   β”œβ”€β”€ api_client.py
β”‚   └── logger.py
β”œβ”€β”€ pages/
β”‚   β”œβ”€β”€ base_page.py
β”‚   β”œβ”€β”€ login_page.py
β”‚   └── components/
β”‚       β”œβ”€β”€ date_picker.py
β”‚       └── modal.py
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ user_service.py
β”‚   └── order_service.py
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ retry.py
β”‚   β”œβ”€β”€ waits.py
β”‚   └── test_data_factory.py
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ conftest.py
β”‚   β”œβ”€β”€ test_login.py
β”‚   └── test_checkout.py
β”œβ”€β”€ data/
β”‚   └── test_users.yaml
β”œβ”€β”€ reports/
β”œβ”€β”€ .github/workflows/tests.yml
β”œβ”€β”€ requirements.txt
└── pytest.ini
βš–οΈ

Part 3: The Laws β€” Coding Standards That Prevent Rot

Architecture is only half the battle. The other half is discipline. Here are the 'laws' I enforce on every project.

Law #1: Never Hardcode. Ever.

I learned this after breaking production tests because someone committed a hardcoded staging URL.

Code
# config/settings.py
import os
from dataclasses import dataclass
from dotenv import load_dotenv

# Load environment-specific config
load_dotenv(f"config/environments/{os.getenv('ENV', 'dev')}.env")

@dataclass
class Config:
    base_url: str = os.getenv("BASE_URL", "https://staging.app.com")
    browser: str = os.getenv("BROWSER", "chrome")
    headless: bool = os.getenv("HEADLESS", "false").lower() == "true"
    timeout: int = int(os.getenv("TIMEOUT", "30"))
    
    # API settings
    api_base_url: str = os.getenv("API_URL", "https://api.staging.app.com")
    api_key: str = os.getenv("API_KEY", "")

config = Config()

Law #2: Page Objects Should Be Components

Don't just create Page Objectsβ€”create Component Objects. A DatePicker, a Modal, a Dropdown should be reusable across pages.

Code
class DatePickerComponent:
    def __init__(self, driver, root_element):
        self.driver = driver
        self.root = root_element

    def select_date(self, date_str):
        self.root.click()
        self.driver.find_element(By.CSS_SELECTOR, f"[data-date='{date_str}']").click()


class BookingPage:
    def __init__(self, driver):
        self.driver = driver
        self.date_picker = DatePickerComponent(
            driver,
            driver.find_element(By.ID, "date-picker")
        )

Law #3: 90% of Flakiness is Timing

Synchronization is the root cause of 60–70% of Selenium flakiness. Always Use: Explicit waits (WebDriverWait), Conditions (presence, visibility, clickability), Smart retries with exponential backoff. Never Use: time.sleep(), Implicit waits.

Code
# utils/retry.py
from functools import wraps
import time

def retry(max_attempts=3, delay=1, exceptions=(Exception,)):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    last_exception = e
                    if attempt < max_attempts - 1:
                        time.sleep(delay * (attempt + 1))  # Exponential backoff
            raise last_exception
        return wrapper
    return decorator

Law #4: API First, UI Second

The Testing Pyramid exists for a reason: 70-80% API tests (fast), 10-20% UI tests (slow), ~10% Unit tests (instant). Use APIs to seed test data before running UI tests.

Code
import requests

def seed_user(role):
    return requests.post(
        "https://myapp/api/test/createUser",
        json={"role": role}
    ).json()

def test_admin_dashboard(browser):
    user = seed_user("admin")  # API setup (fast)
    login_page = LoginPage(browser)
    login_page.login(user["email"], user["password"])
    assert DashboardPage(browser).is_admin_panel_visible()  # UI validation only

Law #5: Observability is Not Optional

When tests fail at 3 AM in CI, you need: Structured Logging (structlog), Screenshots on Failure, HTML Reports (pytest-glow-report), Slack/Teams Notifications.

πŸ€–

Part 4: The Future β€” AI and Self-Healing

As I wrote about in Integrating LLMs into Python Automation Frameworks, the future is intelligent frameworks. We're moving from deterministic automation (rules-based) to probabilistic automation (inference-based).

The Self-Healing Decorator

When an element is missing, instead of failing immediately, ask an LLM for help:

Code
import functools
from selenium.common.exceptions import NoSuchElementException

def self_healing(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except NoSuchElementException:
            driver = args[0].driver
            page_source = driver.page_source[:2000]  # Truncate for token limits
            
            # Ask LLM for help
            prompt = f"""
            I tried to find an element but failed. 
            Function: '{func.__name__}'
            HTML: {page_source}
            
            Suggest the CSS selector that most likely represents the intended element.
            Return ONLY the CSS selector string.
            """
            new_selector = query_local_llm(prompt)  # Using Ollama locally
            return driver.find_element("css selector", new_selector)
    return wrapper
Warning
The self-healing decorator is powerful, but never ship it in CI without a HUMAN_REVIEW_REQUIRED gate. LLM suggestions should be logged for review, not auto-applied. One unchecked 'fix' can cascade into Monday-morning horror stories.

Conclusion

Being an efficient automation engineer is not about writing more scriptsβ€”it's about writing better, more robust, and highly maintainable ones.

The Core Message: You don't need expensive commercial tools to build a world-class automation framework. With Python and open-source libraries, you can achieve 80% of the functionality at almost zero cost.

  • Identify the pain: bot detection, visual blindness, legacy systems, Shadow DOM, ugly reports
  • Build free tools to solve it: sb-stealth-wrapper, Visual Guard, Visual Sonar, Lumos ShadowDOM, pytest-glow-report
  • Organize into layers: Test β†’ Business β†’ Core β†’ Infrastructure
  • Enforce discipline: Config, Components, Waits, API-first, Observability
  • Embrace the future: AI-powered self-healing

And when the standard tools fail you? Build your own. All the tools I've shared are free, open-source, and available on PyPI. That's how all of this started.

(Use any or noneβ€”just my 2Β’ after living the same nightmares.)

Dhiraj Das

About the Author

Dhiraj Das is a Senior Automation Consultant specializing in Python, AI, and Intelligent Quality Engineering. Beyond delivering enterprise solutions, he dedicates his free time to tackling complex automation challenges, publishing tools like sb-stealth-wrapper and lumos-shadowdom on PyPI.

Share this article:

Get In Touch

Interested in collaborating or have a question about my projects? Feel free to reach out. I'm always open to discussing new ideas and opportunities.