Defects & flaky tests
Two of the highest-friction parts of test management: linking a failing test to a real bug ticket, and figuring out which tests are reliably broken vs flaking out.
Defects
A defect in Kaizen is a record that links one or more failing test cases to an external bug tracker issue (Jira, GitHub Issues, Linear, GitLab Issues).
How defects appear
Three sources:
-
From your reporter annotations:
test.info().annotations.push({ type: 'jira', description: 'ACME-1234' });On ingest, Kaizen parses
labels.jiraand creates / updates the linked defect automatically. -
Manual link from a failing case in the UI:
- Open a failing case → Link defect → paste a tracker URL
- We resolve the title, status, assignee, priority via the tracker's API
-
AI clustering (Team plan and above):
- Failures with similar stack traces / error messages get grouped automatically
- The cluster suggests an existing defect or proposes opening a new one
The Defects tab
Lists every defect with rolled-up state across cases:
- Cases affected — how many test cases reference this defect
- First seen / last seen — earliest and latest run timestamps
- Tracker status — synced from Jira/GitHub/Linear
- Owner — assignee from the tracker
- Severity — derived from the failing case severity
Click in to see the full case list, the runs they failed in, and a stack-trace excerpt.
Integrations
Set up under Project Settings → Integrations:
- Jira Cloud / Server — OAuth app or API token
- GitHub Issues — repo-scoped GitHub App
- Linear — workspace token
- GitLab — project access token
When the tracker resolves a ticket, the linked Kaizen defect auto-closes (and reopens if the test fails again).
Flaky tests
A test is flaky when it produces both pass and fail outcomes against the same code (no relevant change between runs). Kaizen's flake detection is per-case, per-branch, with a rolling window.
How it's computed
Default formula:
- Look at the last N=20 runs on the same branch
- A case is considered flaky if its pass-rate is between 0.05 and 0.95 and it has at least one pass and one fail in the window
- Severity is graded by failure rate:
low< 25% fails,medium25-60%,high> 60%
Tunable per-project under Settings → Flake Detection.
The Flaky tab
- Lists every case currently flagged
- Sortable by failure rate, severity, owner, last seen
- Each row shows the rolling pass-rate as a sparkline + the runs it has flaked in
- Click in to see all step diffs across runs (passes vs fails — what changed?)
Acting on flakes
For each flagged case:
- Quarantine — exclude from gating dashboards while you investigate. Still runs in CI; just doesn't fail the rollup
- Open defect — link a tracker ticket
- Mark intentional — for tests that are meant to retry-with-randomness (rare; useful for property tests)
- Suggest owner — auto-derived from CODEOWNERS
Rerun-failed flow
If your run.env.runUrl is set, the embedded Kensho viewer offers a Re-run failed action. The platform calls your CI's webhook with the failing case IDs as input — your workflow re-runs only those. The new run lands in Kaizen as a child of the original; flake detection accounts for it.