Automation
AI
Test Automation
pytest-why: Turning Pytest Failures into Actionable Engineering Guidance

pytest-why: Turning Pytest Failures into Actionable Engineering Guidance

June 15, 2026 5 min read
🎯

What pytest-why Adds

  • Explicit failure classification for assertions, fixtures, imports, timeouts, and unknown errors
  • Phase-aware diagnostics across setup, call, teardown, and collection
  • Browser automation context for Selenium and Playwright failures
  • Shareable reports in Markdown and standalone HTML
  • Full traceback preservation so guidance never replaces the original evidence

Pytest is excellent at telling us that a test failed. It identifies the node, the phase, the assertion diff, and the traceback. The harder question is often what to inspect first.

In a large suite, technically complete output can still take several minutes to classify. A failure might come from an incorrect assertion, a fixture that never initialized, a broken import, a blocked operation, or a browser interaction that ran before the page was ready.

Evidence Is Not a Diagnosis
pytest-why keeps pytest's evidence intact, then organizes it around three practical questions: what kind of failure occurred, why that category usually happens, and what the developer should inspect first.

The Problem: Classification Comes Before Debugging

Code
def test_total_price():
    subtotal = 100
    tax = 18

    assert subtotal + tax == 120

For an experienced developer, the next step is straightforward: compare expected and actual values, then trace where they diverged. In a long CI log or an unfamiliar codebase, that interpretation still costs time.

  • A missing fixture fails during setup before the test body runs.
  • An import error can stop collection entirely.
  • A timeout often points to a blocked operation rather than the final traceback line.
  • A Selenium or Playwright failure can indicate selector drift, timing, waits, or interactability.
  • A teardown error can reveal broken cleanup after a successful test body.

An Opt-In Diagnostic Layer

pytest-why is deliberately opt-in. Install the package, enable the plugin with one flag, and pytest continues to run normally while the diagnostic layer observes failures.

Code
python -m pip install pytest-why
pytest --why
CapabilityBehavior
Runtime captureObserves failures from setup, test execution, and teardown
Collection captureRecords errors raised while pytest discovers tests
ClassificationMaps common traceback patterns to stable failure categories
Terminal summaryPrints a concise explanation and suggested first checks
Report outputWrites pytest-why-report.md and pytest-why-report.html
Traceback handlingPreserves the complete long representation from pytest

A First Example

Code
def test_checkout_total():
    actual_total = 118
    expected_total = 120

    assert actual_total == expected_total

With pytest --why enabled, the normal traceback remains visible. The plugin also labels the failure as an assertion mismatch and points the developer toward expected-versus-actual values, inputs, transformations, and precision assumptions.

Designed for the Next Action
The output is intentionally concise. It does not claim to know the exact root cause; it narrows the search space and suggests the highest-value first checks.

Understanding Pytest Failure Phases

PhaseWhat RunsTypical Failures
setupFixtures and preconditionsMissing fixtures, fixture exceptions, environment setup
callThe test function itselfAssertions, application exceptions, browser interactions
teardownFixture finalizers and cleanupCleanup errors, resource release failures
collectionTest discovery and module importImport errors, syntax problems, broken test modules

Treating every failure as a test-body problem leads to noisy advice. pytest-why stores the pytest phase with each diagnostic so the explanation can distinguish a broken fixture from a failed assertion or cleanup problem.

The Classification Model

CategorySignalsSuggested Investigation
Assertion mismatchAssertionError or assertion outputCompare expected and actual values, inputs, transforms, and precision
Import errorImportError, ModuleNotFoundError, or collection import outputCheck dependencies, package layout, paths, and circular imports
Fixture errorFixture lookup text or setup-phase fixture failuresCheck fixture names, scopes, dependencies, and conftest discovery
TimeoutTimeoutError or timeout-related traceback textInspect blocked I/O, waits, external dependencies, and timing assumptions
Unknown failureNo supported rule matchedUse the phase, exception, and full traceback as the investigation starting point

The classifier favors transparent, deterministic rules over opaque prediction. That makes behavior easy to test and explain, while keeping the unknown category honest when the evidence does not support a stronger claim.

Unknown Is a Valid Result
A diagnostic tool should not manufacture confidence. Unknown failures still include the phase and full traceback, but avoid presenting a guess as a diagnosis.

Browser Automation Context

Browser failures often share generic exception shapes, so pytest-why enriches supported Selenium and Playwright traces with domain-specific checks.

  • Confirm that the locator still matches the intended element.
  • Check whether navigation, rendering, or network activity completed before the action.
  • Verify visibility, enabled state, and interactability.
  • Review explicit waits and remove timing assumptions hidden in fixed sleeps.
  • Inspect whether frames, popups, shadow roots, or page transitions changed the browser context.

How the Plugin Integrates with Pytest

Code
pytest run
   |
   +-- collection reports
   +-- setup / call / teardown reports
   |
   v
pytest-why classifier
   |
   +-- terminal summary
   +-- Markdown report
   +-- standalone HTML report
HookResponsibility
pytest_addoptionRegisters the --why command-line flag
pytest_configureInitializes plugin state only when the flag is enabled
pytest_runtest_logreportReceives setup, call, and teardown reports
pytest_collectreportReceives collection failures
pytest_terminal_summaryPrints the final diagnostic summary

The package exposes a pytest11 entry point, allowing pytest to discover the plugin after installation. The command-line flag controls whether diagnostic state and report generation are active.

Code
[project.entry-points.pytest11]
why = "pytest_why.plugin"

Why the Raw Traceback Is Preserved

A concise explanation is useful for orientation. The traceback remains the source of truth for investigation.

pytest-why stores pytest's complete long representation instead of reducing failures to an exception message. This preserves stack frames, assertion introspection, chained exceptions, fixture context, collection details, and plugin-provided traceback information.

Markdown and HTML Reports

Every enabled run writes a Markdown report and a self-contained HTML report. Both formats group the classification, explanation, suggested next steps, phase, and original traceback into an artifact that can move beyond the terminal.

  • Attach the Markdown report to an issue or pull request.
  • Upload both reports as CI artifacts.
  • Share the HTML file with teammates who do not have the local environment.
  • Keep the traceback alongside the guidance for review and escalation.
Report Safety Matters
The Markdown writer selects a code fence longer than any backtick run inside the traceback. The HTML writer escapes dynamic traceback content before placing it in the document.

Collection Errors Are First-Class Failures

Many plugins focus only on executed tests, but import and discovery failures can stop a suite before any test item exists. pytest-why listens for collection reports separately and routes those errors through the same classification and reporting pipeline.

Code
from missing_dependency import client

def test_client():
    assert client.is_ready()

Using Reports in CI

Code
- name: Run tests with diagnostics
  run: pytest --why

- name: Upload pytest-why reports
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: pytest-why-reports
    path: |
      pytest-why-report.md
      pytest-why-report.html

The always condition is important: diagnostic artifacts are most valuable when tests fail, which is exactly when a default success-only upload step would be skipped.

Testing a Pytest Plugin

The project uses pytester to create temporary test suites and execute nested pytest runs. This verifies real plugin discovery, option handling, terminal output, collection behavior, and report generation rather than testing only isolated helper functions.

  • Classifier unit tests for each supported category and fallback behavior.
  • Reporter tests for Markdown fences and HTML escaping.
  • End-to-end runs for setup, call, teardown, and collection failures.
  • Disabled-mode checks confirming normal pytest behavior without --why.
  • Artifact checks confirming both report files are written.

Current Design Boundaries

The current release is intentionally focused: deterministic rules, concise guidance, two human-readable report formats, and an opt-in workflow. It does not claim to infer an exact root cause or replace team-specific debugging knowledge.

  • Configurable custom classification rules.
  • Configurable report paths and filenames.
  • Machine-readable JSON output.
  • Grouping and deduplication of repeated failures.
  • Source-code links for supported CI providers.
  • Additional domain-specific enrichers beyond browser automation.

Getting Started

Code
python -m pip install -U pytest-why
pytest --why

Explore the pytest-why project page, install it from PyPI, or review the source on GitHub.

From Failure to a Useful First Move
pytest-why does not make tracebacks shorter by discarding evidence. It makes them easier to act on by adding classification, context, and a practical starting point.
Dhiraj Das

About the Author

Dhiraj Das | Senior Automation Consultant | 10+ years building test automation that actually works. He transforms flaky, slow regression suites into reliable CI pipelines—designing self-healing frameworks that don't just run tests, but understand them.

Creator of many open-source tools solving what traditional automation can't: waitless (flaky tests), sb-stealth-wrapper (bot detection), selenium-teleport (state persistence), selenium-chatbot-test (AI chatbot testing), lumos-shadowdom (Shadow DOM), and visual-guard (visual regression).

Share this article:

Get In Touch

Interested in collaborating or have a question about my projects? Feel free to reach out. I'm always open to discussing new ideas and opportunities.