Key Insights
- Research Shapes Thinking: Academic principles influence how I approach automation problems
- Noise = Flakiness: The same mental model for filtering spatial noise applies to test stability
- Efficiency Matters: Algorithmic thinking drives optimization in wait strategies and element selection
- Pattern Recognition: The mindset of "finding order in chaos" applies to self-healing frameworks
*"Just click this, type that, check this."*
If this is how you think about test automation, you're building a house of cards.
Most automation engineers focus on actionsāwhat to click, what to type, what to assert. But treating automation as just a sequence of actions leads to brittle scripts that shatter the moment the UI changes a single class name.
I don't look at automation as scripting. I look at it as a data problem.
Why? Because long before I was building automation frameworks in Python, my co-author Hrishav and I were researching algorithmic efficiency in spatial databases. That researchāpublished in 2012ādidn't teach me a specific technique to copy-paste into Selenium. But it fundamentally shaped *how I think* about complex data problems.
The Foundation: The TDCT Algorithm
Back in 2012, Hrishav and I co-authored a research paper titled *"A Density Based Clustering Technique For Large Spatial Data Using Polygon Approach"* (TDCT).
The Problem We Solved
How do you find meaningful patterns (clusters) in massive, chaotic datasetsāwithout getting overwhelmed by noise?
Existing algorithms like DBSCAN were good, but they struggled with:
- Arbitrary shapes: Real-world data doesn't form neat circles
- Computational cost: Scanning every point against every other point doesn't scale
- Noise sensitivity: Outliers distorted the cluster boundaries
Our Solution: Triangular Density
Instead of using circular neighborhoods (like DBSCAN), we mapped data points into triangular polygons. This allowed us to:
- Calculate density more efficiently than radial scans
- Detect clusters of arbitrary, non-convex shapes
- Isolate noise points without corrupting core clusters
The Bridge: Why This Matters for Quality Engineering
*"Dhiraj, what does spatial clustering have to do with Selenium?"*
Not the codeāthe mindset.
It's About Problem Framing
When I encounter a complex automation challenge, I don't immediately think "what Selenium command do I need?" I think:
- What's the data structure here? (The DOM is a tree, test results are time-series data)
- What's the noise vs. the signal? (Which element attributes are stable? Which failures are true bugs?)
- How can I reduce complexity? (Can I optimize the problem's "geometry" like TDCT did?)
This mental modelātrained by years of algorithmic researchāinfluences every framework decision I make.
Applying the Mindset: Practical Examples
Example 1: Multi-Attribute Element Location with Fallback Logic
Brute-Force Approach (like naive spatial scanningāsingle point of failure):
# If ID changes, everything breaks
element = driver.find_element(By.ID, "checkout-btn-v3")Algorithmic Approach (like TDCT's density-core identificationāmultiple data points):
def find_element_with_fallback(driver, strategies: list[tuple]) -> WebElement:
"""
Instead of relying on one brittle locator, we analyze multiple
'data points' (attributes) ordered by reliability/stability.
Like TDCT identifies cluster cores by density, we identify
elements by attribute stability.
"""
for strategy, locator in strategies:
try:
element = WebDriverWait(driver, 2).until(
EC.element_to_be_clickable((strategy, locator))
)
return element
except TimeoutException:
continue
raise NoSuchElementException("All strategies exhausted")
# Define strategies ordered by stability (most stable first)
checkout_strategies = [
(By.CSS_SELECTOR, "[data-testid='checkout']"), # Most stable: test IDs
(By.CSS_SELECTOR, "button[aria-label='Checkout']"), # Accessibility attrs
(By.XPATH, "//button[contains(text(), 'Checkout')]"), # Text content
(By.CSS_SELECTOR, ".cart-section button.primary"), # Structural fallback
]
checkout_btn = find_element_with_fallback(driver, checkout_strategies)This reflects the TDCT mindset: instead of relying on a single identifier (like a single spatial coordinate), we cluster multiple attributes by reliability and select the highest-confidence match. The first strategy is our "density core"āif it fails, we gracefully fall back to less stable but still valid "neighbors."
Example 2: Self-Healing Element Location
Static Approach (brittle, noise-sensitive):
driver.find_element(By.ID, "submit-btn-v3") # Breaks when ID changesAdaptive Approach (cluster-like resilience):
# When an element isn't found, analyze multiple attributes:
# - Text content (stable?)
# - Class names (which are consistent?)
# - Position relative to stable anchors
# Then select the "highest confidence" matchThis isn't literally running TDCT. But the *thinking* is the same: instead of relying on a single brittle identifier, we analyze multiple "data points" (attributes) to find the most stable combination.
The Tools I Build Reflect This Philosophy
When I created packages like Lumos ShadowDOM or Visual Guard, I wasn't consciously implementing clustering algorithms. But the design decisions reflect the same principles:
- Traversing Shadow DOM efficiently ā Understanding the *structure* of the problem before brute-forcing
- Visual regression with SSIM ā Using mathematical models (not pixel-by-pixel noise) to find meaningful differences
- Self-healing in my frameworks ā Treating element attributes as "data points" with varying reliability
The research doesn't give me copy-paste solutions. It gives me a lens for seeing automation as a data problem, not a scripting problem.
Conclusion: Automation Isn't Just CodeāIt's Logic
Whether it's the TDCT algorithm Hrishav and I published years ago or the automation tools and libraries I build today, the goal remains the same:
The DOM is chaotic. Test data is chaotic. UI changes are chaotic.
But with the right algorithmic mindsetātrained by research in one domaināwe can bring that discipline to another domain entirely.
Read the Original Research
š A Density Based Clustering Technique For Large Spatial Data Using Polygon Approach (TDCT) ā Published on ResearchGate

