Defects & flaky tests

Two of the highest-friction parts of test management: linking a failing test to a real bug ticket, and figuring out which tests are reliably broken vs flaking out.

Defects

A defect in Kaizen is a record that links one or more failing test cases to an external bug tracker issue (Jira, GitHub Issues, Linear, GitLab Issues).

How defects appear

Three sources:

From your reporter annotations:
```
test.info().annotations.push({ type: 'jira', description: 'ACME-1234' });
```
On ingest, Kaizen parses labels.jira and creates / updates the linked defect automatically.
Manual link from a failing case in the UI:
- Open a failing case → Link defect → paste a tracker URL
- We resolve the title, status, assignee, priority via the tracker's API
AI clustering (Team plan and above):
- Failures with similar stack traces / error messages get grouped automatically
- The cluster suggests an existing defect or proposes opening a new one

The Defects tab

Lists every defect with rolled-up state across cases:

Cases affected — how many test cases reference this defect
First seen / last seen — earliest and latest run timestamps
Tracker status — synced from Jira/GitHub/Linear
Owner — assignee from the tracker
Severity — derived from the failing case severity

Click in to see the full case list, the runs they failed in, and a stack-trace excerpt.

Integrations

Set up under Project Settings → Integrations:

Jira Cloud / Server — OAuth app or API token
GitHub Issues — repo-scoped GitHub App
Linear — workspace token
GitLab — project access token

When the tracker resolves a ticket, the linked Kaizen defect auto-closes (and reopens if the test fails again).

Flaky tests

A test is flaky when it produces both pass and fail outcomes against the same code (no relevant change between runs). Kaizen's flake detection is per-case, per-branch, with a rolling window.

How it's computed

Default formula:

Look at the last N=20 runs on the same branch
A case is considered flaky if its pass-rate is between 0.05 and 0.95 and it has at least one pass and one fail in the window
Severity is graded by failure rate: low < 25% fails, medium 25-60%, high > 60%

Tunable per-project under Settings → Flake Detection.

The Flaky tab

Lists every case currently flagged
Sortable by failure rate, severity, owner, last seen
Each row shows the rolling pass-rate as a sparkline + the runs it has flaked in
Click in to see all step diffs across runs (passes vs fails — what changed?)

Acting on flakes

For each flagged case:

Quarantine — exclude from gating dashboards while you investigate. Still runs in CI; just doesn't fail the rollup
Open defect — link a tracker ticket
Mark intentional — for tests that are meant to retry-with-randomness (rare; useful for property tests)
Suggest owner — auto-derived from CODEOWNERS

Rerun-failed flow

If your run.env.runUrl is set, the embedded Kensho viewer offers a Re-run failed action. The platform calls your CI's webhook with the failing case IDs as input — your workflow re-runs only those. The new run lands in Kaizen as a child of the original; flake detection accounts for it.