Defects & flaky tests

Two of the highest-friction parts of test management: linking a failing test to a real bug ticket, and figuring out which tests are reliably broken vs flaking out.

Defects

A defect in Kaizen is a record that links one or more failing test cases to an external bug tracker issue (Jira, GitHub Issues, Linear, GitLab Issues).

How defects appear

Three sources:

  1. From your reporter annotations:

    test.info().annotations.push({ type: 'jira', description: 'ACME-1234' });
    

    On ingest, Kaizen parses labels.jira and creates / updates the linked defect automatically.

  2. Manual link from a failing case in the UI:

    • Open a failing case → Link defect → paste a tracker URL
    • We resolve the title, status, assignee, priority via the tracker's API
  3. AI clustering (Team plan and above):

    • Failures with similar stack traces / error messages get grouped automatically
    • The cluster suggests an existing defect or proposes opening a new one

The Defects tab

Lists every defect with rolled-up state across cases:

  • Cases affected — how many test cases reference this defect
  • First seen / last seen — earliest and latest run timestamps
  • Tracker status — synced from Jira/GitHub/Linear
  • Owner — assignee from the tracker
  • Severity — derived from the failing case severity

Click in to see the full case list, the runs they failed in, and a stack-trace excerpt.

Integrations

Set up under Project Settings → Integrations:

  • Jira Cloud / Server — OAuth app or API token
  • GitHub Issues — repo-scoped GitHub App
  • Linear — workspace token
  • GitLab — project access token

When the tracker resolves a ticket, the linked Kaizen defect auto-closes (and reopens if the test fails again).

Flaky tests

A test is flaky when it produces both pass and fail outcomes against the same code (no relevant change between runs). Kaizen's flake detection is per-case, per-branch, with a rolling window.

How it's computed

Default formula:

  • Look at the last N=20 runs on the same branch
  • A case is considered flaky if its pass-rate is between 0.05 and 0.95 and it has at least one pass and one fail in the window
  • Severity is graded by failure rate: low < 25% fails, medium 25-60%, high > 60%

Tunable per-project under Settings → Flake Detection.

The Flaky tab

  • Lists every case currently flagged
  • Sortable by failure rate, severity, owner, last seen
  • Each row shows the rolling pass-rate as a sparkline + the runs it has flaked in
  • Click in to see all step diffs across runs (passes vs fails — what changed?)

Acting on flakes

For each flagged case:

  • Quarantine — exclude from gating dashboards while you investigate. Still runs in CI; just doesn't fail the rollup
  • Open defect — link a tracker ticket
  • Mark intentional — for tests that are meant to retry-with-randomness (rare; useful for property tests)
  • Suggest owner — auto-derived from CODEOWNERS

Rerun-failed flow

If your run.env.runUrl is set, the embedded Kensho viewer offers a Re-run failed action. The platform calls your CI's webhook with the failing case IDs as input — your workflow re-runs only those. The new run lands in Kaizen as a child of the original; flake detection accounts for it.