Field Notes Summary
- The Struggles: Bot detection, visual blindness, legacy systems, Shadow DOM, ugly reports
- The Tools: sb-stealth-wrapper, Visual Guard, Visual Sonar, Lumos ShadowDOM, pytest-glow-report
- The Blueprint: Layered architecture (Test β Business β Core β Infrastructure)
- The Laws: Config, Components, Waits, API-first, Observability
I've spent countless hours refactoring 'perfect' code that simply stopped working. In this deep dive, I'm sharing the mistakes, the struggles, and the architecture that emerged from the chaos.
This isn't a theoretical exercise. Every pattern here was born from a real problem I faced over the years. These are some lessons I learned along the wayβfeel free to use them if you find them useful.
Part 1: The Struggles That Shaped This Framework
Before we talk about architecture, we have to talk about why standard Selenium/Appium scripts fail. They don't fail because the logic is wrong; they fail because the modern web is hostile, legacy systems are opaque, and UI rendering is unpredictable. Here is how I fought back.
π₯ Struggle #1: The "403 Forbidden" Nightmare
Every automation engineer knows this pain: your script works perfectly locally, then fails in CI with 403 Forbidden. You're not testing a secure bank vaultβyou're testing a simple landing page.
I spent months fighting bot detectionβCloudflare Turnstiles, 'Verify you are human' loops, and scripts that worked yesterday but fail today. The typical 'please-don't-ban-me' starter pack didn't cut it.
# The OLD WAY - What everyone tries (and fails with)
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("user-agent=Mozilla/5.0...")
# ... 20 more lines of hoping it works
# Result: Still blocked.The Root Cause: Headless Chrome on Linux is a dead giveaway. It screams 'I am a bot' because it lacks a display server.
My Solution: I built sb-stealth-wrapperβa library that wraps SeleniumBase's UC Mode with intelligent defaults. On Linux, it automatically spawns a virtual display (Xvfb), so Chrome *thinks* it has a screen. Combined with heuristic clicking (scroll β hover β click), it bypasses most bot detection.
# The NEW WAY - 4 lines that just work
from sb_stealth_wrapper import StealthBot
with StealthBot(headless=True) as bot:
bot.safe_get("https://nowsecure.nl")
bot.smart_click("#verify") # Human-like interactionBenchmark Results: Standard Selenium β FAIL | Playwright β FAIL | StealthBot β PASS (see CI run)
π¦ PyPI: pypi.org/project/sb-stealth-wrapper
π₯ Struggle #2: "The Test Passed, But the UI is Broken"
We've all been there: The test suite passes 100%, but when you open the application, the submit button is overlapping the footer. This is Visual Blindnessβthe inability of standard functional scripts to see the application like a user does.
A button can be 1px wide, transparent, or white-on-white, and Selenium will happily click it and pass the test. Some commercial tools solve this brilliantly with AI-powered visual testingβbut they come with enterprise pricing.
My Solution: I built Visual Guardβa free, open-source Python library that delivers ~80% of the value at zero cost. It follows a strict workflow: Snapshot β Compare β Report. With SSIM algorithms, it ignores minor rendering differences but catches real layout regressions.
from visual_guard import VisualTester
tester = VisualTester()
tester.assert_visual(
driver,
screenshot_name="homepage_v1",
mask=[(100, 100, 200, 50)] # Mask out dynamic ads
)Threshold Note: Visual Guard ships with SSIM threshold of 0.92. Anything below this triggers a failure, catching real regressions while ignoring minor anti-aliasing differences.
π¦ PyPI: pypi.org/project/visual-guard
π₯ Struggle #3: The "Unautomatable" Legacy Systems
How do you automate a legacy Windows application running inside Windows Virtual Desktop (WVD) or Citrix? You fire up Selenium... and realize there's no DOM. You try UiPath or Blue Prism... and hit a $$$$ licensing wall. You consider AutoIt... and discover the coordinates break when someone changes their monitor.
My Solution: I built Visual Sonarβa free, open-source tool that borrows from nature. Like bats using echolocation, it presses TAB, detects where pixels changed (focus ring), and maps form field coordinates dynamically.
1. Take screenshot (BEFORE)
2. Press TAB key
3. Take screenshot (AFTER)
4. Diff the images β Changed region = FIELDNo DOM needed. No expensive licenses. Just Python + OpenCV + PyAutoGUI.
π¦ PyPI: pypi.org/project/visual-sonar
π₯ Struggle #4: The Shadow DOM Black Hole
Modern web frameworks like Salesforce Lightning, Angular Material, and Polymer use Shadow DOM to encapsulate components. The problem? Standard Selenium commands can't see inside shadow roots. Your `find_element` calls return nothing, even though the element is right there in the browser.
To access an element, you have to find the host, get the shadow root via JavaScript, and search within. If you have nested shadow roots (common in enterprise apps), you repeat this nightmare recursively.
My Solution: I built Lumos ShadowDOMβa package that extends Selenium WebDriver with a simple `find_shadow()` method. It handles all the JavaScript execution and recursive traversal for you.
from selenium import webdriver
import lumos_shadowdom # Activates the extension
driver = webdriver.Chrome()
driver.get("https://example.com/shadow-dom-app")
# Instead of 15 lines of JS execution:
driver.find_shadow("my-app > settings-panel > #save-btn").click()π¦ PyPI: pypi.org/project/lumos-shadowdom
π₯ Struggle #5: Reports Nobody Reads
Traditional test output is designed for developers, not for the entire team. Product managers want a quick status. QA leads want trends. Executives want a go/no-go decision for release.
I was tired of ugly developer tools. Life's too short for Comic Sans and #FF0000 error text.
My Solution: I built pytest-glow-reportβa plugin that generates beautiful HTML reports with Executive Summary, Risk Level scoring, and visual dashboards. Zero configuration.
pip install pytest-glow-report
pytest --glow-reportπ¦ PyPI: pypi.org/project/pytest-glow-report
Part 2: The Blueprint β Layered Architecture
Now that we've addressed the specific pain points, let's talk about how to organize all these pieces into a cohesive system. After years of iteration, here's the architecture I recommend:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β TEST LAYER β
β (Test Cases, Test Data, Assertions) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β BUSINESS LAYER β
β (Page Objects, API Services, Keywords) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β CORE LAYER β
β (Driver Management, Config, Utilities, Waits) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β INFRASTRUCTURE β
β (CI/CD, Reporting, Logging, Environment) β
βββββββββββββββββββββββββββββββββββββββββββββββββββThe Magic: When the UI changes, only the Page Objects change. When the API changes, only the Service layer changes. Tests remain stable.
Recommended Project Structure
automation-framework/
βββ config/
β βββ settings.py
β βββ environments/
β βββ dev.env
β βββ staging.env
β βββ prod.env
βββ core/
β βββ driver_factory.py
β βββ api_client.py
β βββ logger.py
βββ pages/
β βββ base_page.py
β βββ login_page.py
β βββ components/
β βββ date_picker.py
β βββ modal.py
βββ services/
β βββ user_service.py
β βββ order_service.py
βββ utils/
β βββ retry.py
β βββ waits.py
β βββ test_data_factory.py
βββ tests/
β βββ conftest.py
β βββ test_login.py
β βββ test_checkout.py
βββ data/
β βββ test_users.yaml
βββ reports/
βββ .github/workflows/tests.yml
βββ requirements.txt
βββ pytest.iniPart 3: The Laws β Coding Standards That Prevent Rot
Architecture is only half the battle. The other half is discipline. Here are the 'laws' I enforce on every project.
Law #1: Never Hardcode. Ever.
I learned this after breaking production tests because someone committed a hardcoded staging URL.
# config/settings.py
import os
from dataclasses import dataclass
from dotenv import load_dotenv
# Load environment-specific config
load_dotenv(f"config/environments/{os.getenv('ENV', 'dev')}.env")
@dataclass
class Config:
base_url: str = os.getenv("BASE_URL", "https://staging.app.com")
browser: str = os.getenv("BROWSER", "chrome")
headless: bool = os.getenv("HEADLESS", "false").lower() == "true"
timeout: int = int(os.getenv("TIMEOUT", "30"))
# API settings
api_base_url: str = os.getenv("API_URL", "https://api.staging.app.com")
api_key: str = os.getenv("API_KEY", "")
config = Config()Law #2: Page Objects Should Be Components
Don't just create Page Objectsβcreate Component Objects. A DatePicker, a Modal, a Dropdown should be reusable across pages.
class DatePickerComponent:
def __init__(self, driver, root_element):
self.driver = driver
self.root = root_element
def select_date(self, date_str):
self.root.click()
self.driver.find_element(By.CSS_SELECTOR, f"[data-date='{date_str}']").click()
class BookingPage:
def __init__(self, driver):
self.driver = driver
self.date_picker = DatePickerComponent(
driver,
driver.find_element(By.ID, "date-picker")
)Law #3: 90% of Flakiness is Timing
Synchronization is the root cause of 60β70% of Selenium flakiness. Always Use: Explicit waits (WebDriverWait), Conditions (presence, visibility, clickability), Smart retries with exponential backoff. Never Use: time.sleep(), Implicit waits.
# utils/retry.py
from functools import wraps
import time
def retry(max_attempts=3, delay=1, exceptions=(Exception,)):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
last_exception = None
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except exceptions as e:
last_exception = e
if attempt < max_attempts - 1:
time.sleep(delay * (attempt + 1)) # Exponential backoff
raise last_exception
return wrapper
return decoratorLaw #4: API First, UI Second
The Testing Pyramid exists for a reason: 70-80% API tests (fast), 10-20% UI tests (slow), ~10% Unit tests (instant). Use APIs to seed test data before running UI tests.
import requests
def seed_user(role):
return requests.post(
"https://myapp/api/test/createUser",
json={"role": role}
).json()
def test_admin_dashboard(browser):
user = seed_user("admin") # API setup (fast)
login_page = LoginPage(browser)
login_page.login(user["email"], user["password"])
assert DashboardPage(browser).is_admin_panel_visible() # UI validation onlyLaw #5: Observability is Not Optional
When tests fail at 3 AM in CI, you need: Structured Logging (structlog), Screenshots on Failure, HTML Reports (pytest-glow-report), Slack/Teams Notifications.
Part 4: The Future β AI and Self-Healing
As I wrote about in Integrating LLMs into Python Automation Frameworks, the future is intelligent frameworks. We're moving from deterministic automation (rules-based) to probabilistic automation (inference-based).
The Self-Healing Decorator
When an element is missing, instead of failing immediately, ask an LLM for help:
import functools
from selenium.common.exceptions import NoSuchElementException
def self_healing(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except NoSuchElementException:
driver = args[0].driver
page_source = driver.page_source[:2000] # Truncate for token limits
# Ask LLM for help
prompt = f"""
I tried to find an element but failed.
Function: '{func.__name__}'
HTML: {page_source}
Suggest the CSS selector that most likely represents the intended element.
Return ONLY the CSS selector string.
"""
new_selector = query_local_llm(prompt) # Using Ollama locally
return driver.find_element("css selector", new_selector)
return wrapperConclusion
Being an efficient automation engineer is not about writing more scriptsβit's about writing better, more robust, and highly maintainable ones.
The Core Message: You don't need expensive commercial tools to build a world-class automation framework. With Python and open-source libraries, you can achieve 80% of the functionality at almost zero cost.
- Identify the pain: bot detection, visual blindness, legacy systems, Shadow DOM, ugly reports
- Build free tools to solve it: sb-stealth-wrapper, Visual Guard, Visual Sonar, Lumos ShadowDOM, pytest-glow-report
- Organize into layers: Test β Business β Core β Infrastructure
- Enforce discipline: Config, Components, Waits, API-first, Observability
- Embrace the future: AI-powered self-healing
And when the standard tools fail you? Build your own. All the tools I've shared are free, open-source, and available on PyPI. That's how all of this started.
(Use any or noneβjust my 2Β’ after living the same nightmares.)

