Bug and Incident Severity Template
Dec 30, 2023
Bug and Incident Severity Guide
Principles
- Fast triage: We grade and prioritise the bug straight away.
- Bias to action: We have a strong bias towards action, we either resolve the issue or close it.
- Ask forgiveness, not permission: Everyone can make a decision, follow the guidelines, then tell others about your decision.
- Accountability: We’re all collectively accountable for keeping our bugs in a healthy state, no one person or team is responsible. If any one of us shirks this responsibility that’s the point at which our bugs get out of control,
- Keep it tidy: We groom our bug backlog at the squad level and close bugs that are no longer relevant, we follow a zero-bug policy.
Zero Bug Policy
We operate a zero-bug policy. This means that there are 0 bugs that age past 30 days (1 month).
Exceptions:
- Ping @<DRI_1> or @<DRI_2> to request an exception for your bug
- Comment the ticket to show that it aging past this point is accepted due to some nuance or complexity around the bug.
- Exceptions happen, but don’t make us come and ask you, be proactive and manage this yourselves well so it is clear to everyone why this one bug has aged past our threshold.
Guidelines
To grade the bug at a given severity, it must meet the definition of a least one of the risks in the following table.
Note: If a workaround exists and is acceptable to everyone, then the severity reduces to the next tier.
| Severity | Jira | Platform Risks | Reputational Risk | Financial Risk | User Risk | SLA | Release Strategy |
|---|---|---|---|---|---|---|---|
| Severity 1: Critical | Highest | Severe impact on the software’s functionality or stability. Crashes, data corruption, complete system failure, critical data loss. | Could lead to significant brand damage and loss of trust among existing and potential customers. | ≥£5,000 potential loss due to halted operations and client compensation. | Affects all users and organisations, requiring immediate attention and remediation. For example, impact in ability to approve a request or place an order. | 1 hour | Hotfix - Immediate release once the fix is developed, tested, and approved, regardless of the release cycle. |
| Severity 2: High | High | Significant issues that can disrupt the normal operation of the software. Malfunction of important features, leading to frequent errors, posing security risks. | May lead to negative customer perception if exposed and not swiftly addressed. The issue may require public disclosure to existing or prospective customers. | ≥£1,000 potential loss due to decreased productivity or possible legal ramifications. | Impacts a large portion of users or key clients, quick fix required to maintain service levels. | Next business day | Expedited Release - Prioritize over other work and include in the next scheduled release, or hotfix if necessary. |
| Severity 3: Medium | Medium | Noticeable impact on the software’s usability or functionality but not a critical issue. Hinders certain features or workflows (but the software remains generally functional). | Some clients might notice and report the issue, creating pockets of dissatisfaction. | Could lead to minor financial losses <£1,000 due to small operational inefficiencies and support costs. | May affect a moderate number of users, likely resulting in support tickets and customer complaints. | 30 days | Next Release - Integrate the fix into the next sprint and release it as part of the normal release cycle. |
| Everything else | Lowest | Minor issues that have little to no impact on the software’s core functionality. Usability issues, non-critical cosmetic or copy problems. | Unlikely to impact customer perception unless widespread or persistent. | Minimal financial risk, primarily related to the time cost of addressing the bug. | Affects only a small number of users, with most probably not noticing or not being significantly bothered. | Only when appropriate, always closed within 30 days | Fix if easy and quick, otherwise close. Consider user feedback and analytics to inform these decisions. |
Definitions
- Platform risk: impact to the
platform’s core operation or stability - Reputational risk: impact to
’s brand - Financial risk: loss we could incur from the issue
- User risk: the number of users or organisations impacted by the issue
- SLA: How long it should take for us to start working on the issue (mean time to respond or MTTR)
- Release: How we should plan and ship the change.
Data
You can track all open bugs, bug age and open vs closed rates in Jira [here](https://example.com
Further Reading
</div><div
data-tab-item="Markdown"
data-tab-group="default"
class='tab-item '>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-Markdown" data-lang="Markdown"><span class="line"><span class="cl">
# Bug and Incident Severity Guide ### Principles - Fast triage: We grade and prioritise the bug straight away. - Bias to action: We have a strong bias towards action, we either resolve the issue or close it. - Ask forgiveness, not permission: Everyone can make a decision, follow the guidelines, then tell others about your decision. - Accountability: We’re all collectively accountable for keeping our bugs in a healthy state, no one person or team is responsible. If any one of us shirks this responsibility that’s the point at which our bugs get out of control, - Keep it tidy: We groom our bug backlog at the squad level and close bugs that are no longer relevant, we follow a zero-bug policy. ### Zero Bug Policy We operate a zero-bug policy. This means that there are 0 bugs that age past 30 days (1 month). Exceptions: - Ping @<DRI_1> or @<DRI_2> to request an exception for your bug - Comment the ticket to show that it aging past this point is accepted due to some nuance or complexity around the bug. - Exceptions happen, but don’t make us come and ask you, be proactive and manage this yourselves well so it is clear to everyone why this one bug has aged past our threshold. ### Guidelines To grade the bug at a given severity, it must meet the definition of a least one of the risks in the following table. Note: If a workaround exists and is acceptable to everyone, then the severity reduces to the next tier. | Severity | Jira | Platform Risks | Reputational Risk | Financial Risk | User Risk | SLA | Release Strategy | | ——————– | ——- | ———————————————————————————————————————————————————————————– | —————————————————————————————————————————————————————- | ———————————————————————————————————– | ————————————————————————————————————————————————————– | ————————————————— | ——————————————————————————————————————— | | Severity 1: Critical | Highest | Severe impact on the software's functionality or stability. Crashes, data corruption, complete system failure, critical data loss. | Could lead to significant brand damage and loss of trust among existing and potential customers. | ≥£5,000 potential loss due to halted operations and client compensation. | Affects all users and organisations, requiring immediate attention and remediation. For example, impact in ability to approve a request or place an order. | 1 hour | Hotfix - Immediate release once the fix is developed, tested, and approved, regardless of the release cycle. | | Severity 2: High | High | Significant issues that can disrupt the normal operation of the software. Malfunction of important features, leading to frequent errors, posing security risks. | May lead to negative customer perception if exposed and not swiftly addressed. The issue may require public disclosure to existing or prospective customers. | ≥£1,000 potential loss due to decreased productivity or possible legal ramifications. | Impacts a large portion of users or key clients, quick fix required to maintain service levels. | Next business day | Expedited Release - Prioritize over other work and include in the next scheduled release, or hotfix if necessary. | | Severity 3: Medium | Medium | Noticeable impact on the software's usability or functionality but not a critical issue. Hinders certain features or workflows (but the software remains generally functional). | Some clients might notice and report the issue, creating pockets of dissatisfaction. | Could lead to minor financial losses <£1,000 due to small operational inefficiencies and support costs. | May affect a moderate number of users, likely resulting in support tickets and customer complaints. | 30 days | Next Release - Integrate the fix into the next sprint and release it as part of the normal release cycle. | | Everything else | Lowest | Minor issues that have little to no impact on the software's core functionality. Usability issues, non-critical cosmetic or copy problems. | Unlikely to impact customer perception unless widespread or persistent. | Minimal financial risk, primarily related to the time cost of addressing the bug. | Affects only a small number of users, with most probably not noticing or not being significantly bothered. | Only when appropriate, always closed within 30 days | Fix if easy and quick, otherwise close. Consider user feedback and analytics to inform these decisions. | ### Definitions - Platform risk: impact to the <COMPANY> platform’s core operation or stability - Reputational risk: impact to <COMPANY>’s brand - Financial risk: loss we could incur from the issue - User risk: the number of users or organisations impacted by the issue - SLA: How long it should take for us to start working on the issue (mean time to respond or MTTR) - Release: How we should plan and ship the change. ### Data You can track all open bugs, bug age and open vs closed rates in Jira [here](https://example.com ### Further Reading -