Agentic AI Reliability — Built on 10+ Years of Automation

I build reliable AI agents with the discipline of test automation.

As AI absorbs routine scripting, the valuable layer moves upward: making autonomous systems observable, testable, and accountable. My automation background is the advantage.

I am moving from traditional automation into agentic AI because deterministic execution is being commoditized. A decade of test architecture, failure isolation, and evidence-first delivery now powers my work on agent run capture, validation harnesses, replay, and postmortems.

View Agent Systems Discuss Reliability

Why the shift is happening

Routine automation is becoming a commodity.

AI agents increasingly generate and execute the deterministic scripts that once required specialist effort. The durable engineering problem is proving that autonomous work is correct.

Why my background wins

Automation discipline is the advantage.

Ten years of assertions, fixtures, failure isolation, CI diagnostics, and evidence capture transfer directly to agents that call tools, edit files, and make decisions.

Where I am building

Agentic AI reliability is the next layer.

I build local-first run capture, validation harnesses, replay, redaction, and failure postmortems so agentic systems can be trusted under production pressure.

Featured in PyCoder's Weekly #714 From Algorithms to Automation Python Tools on PyPI Open Source on GitHub

Explore Agentic AI Reliability

Behind the Code

My work sits at the intersection of test automation and agentic AI reliability. After years of building test frameworks, stabilizing brittle browser flows, debugging CI failures, and turning ambiguous defects into reproducible evidence, I apply the same engineering discipline to AI-agent validation, run observability, and reliability tooling.

Agent testing is not just prompt evaluation. Reliable agentic systems need observability, replayable evidence, failure taxonomies, browser/runtime signals, safe redaction, and postmortems that explain risk. That is the bridge I am building through Agent Blackbox and related reliability tooling.

"Reliable agents need the same discipline that made reliable automation possible.|

Read full background

🔬 Research Foundation

Finding Structure in Complex Systems

I co-developed the Triangle-Density based Clustering Technique (TDCT) to turn noisy spatial data into explainable structure. That same systems thinking now shapes how I design automation and diagnose unreliable AI agent behavior.

💡 See how this research connects to agent reliability →

Published research later expanded into a book on clustering concepts and techniques.

📄 Explore the Research 📚 Explore the Book

🎮 Interactive Game

Master Locator Strategies

Gamified learning experience to master XPath and CSS selectors. Solve puzzles, level up, and sharpen your automation skills.

▶ Play Now

📖 Engineering Playbook

Automation That Survives Reality

A practical guide to browser internals, eliminating flaky tests, maintainable architecture, and delivery backed by evidence. These are the engineering habits that now shape how I build reliable AI agent systems.

📚 Explore the Playbook

🌌 Open Source Protocol

Starlight Protocol

Resilient browser automation through autonomous Sentinel coordination. A formal, open standard for building self-healing test systems.

🚀 Explore

Automation Roots, Agentic AI Direction

Ten years of automation work shaped the habits I now bring to agents: observe the run, control the inputs, isolate the failure, and prove the fix.

10+

Years

Experience

~35%

Avg

Efficiency Boost

High

Impact

Cost Savings

Years

Global Exp

Career Timeline

Automation Consultant

Present

Consulting on automation strategy and quality systems while independently building reliability tooling for AI-assisted engineering: agent run capture, validation workflows, and failure postmortems.

Extending automation discipline into observable, repeatable, evidence-backed agent workflows

▹ Designing Python-first tools for agent diagnostics and postmortems as an independent focus.
▹ Applying CI, browser automation, and failure-triage patterns to AI-agent runs.
▹ Defining guardrails for reliable local-first and AI-assisted workflows.

Hover for details

Senior Automation Developer

7 Years

Built and stabilized large web, API, and mobile automation programs across high-pressure delivery environments.

Reduced flaky failures by 70% across 200+ test suites

▹ Managed complex web, API, and mobile automation suites.
▹ Worked with diverse tools like UFT, AutoIt, and QF Test.
▹ Delivered critical automation solutions for major corporate clients.
▹ Gained deep Airline Domain Expertise.

Hover for details

Junior Automation Developer

3 Years

Built the foundations: reliable Selenium suites, CI integration, maintainable test design, and close QA-engineering collaboration.

Compressed manual regression from 2 weeks to 3 days

▹ Developed test scripts using Selenium WebDriver and Java.
▹ Learned CI/CD integration and version control best practices.
▹ Collaborated with QA teams on manual-to-automation transition.
▹ Built foundational expertise in test design patterns.

Hover for details

Tech Arsenal

Pick the reliability gap: opaque agent runs, flaky CI, Cloudflare walls, login overhead, visual drift, or GenAI UIs. I build tools for the places normal automation and naive AI workflows break.

⭐ Flagship Tools

Agent Blackbox

Agent run postmortems

LLM

pytest-mockllm

Deterministic LLM tests

🔧 Core Capabilities

Automation Languages Agent Diagnostics Browser Engines Mobile Platforms API Testing Pipeline Integration

👁️ Visual Testing

Visual AI Regression Detection RPA Vision

🔄 Platform Challenges

Shadow DOM Framework Migration

📊 Reporting

Stakeholder Reports

Agentic AI and Automation Systems

Practical tools that show the bridge from automation to agentic AI: reliable runs, explainable failures, safer LLM integrations, resilient browser workflows, and test systems that expose risk instead of hiding it.

Recent Insights

Latest notes on automation, agentic AI, reliability, and engineering

Stop Teaching Browser Agents the Same Workflow Twice

When a browser path already worked, rediscovery is waste. Flow2Skill treats a Playwright demonstration as compiler input...

Read Article

Memory Is Not a Lock: How OutcomeLock Stops Agents from Repeating Finished Work

An agent can remember that work is complete while an older action remains queued. OutcomeLock adds a deterministic evide...

Read Article

The IDE Needs a Flight Recorder, Not Just an AI Chat Panel

Antigravity, Claude Code, Codex, Cursor, and JetBrains AI already prove the agentic IDE is real. The next battle is not ...

Read Article

View All Articles