Research2026-04-04 · 4 min read

A confidence floor is the cheapest noise filter you have

PR Quorum defaults to a 0.75 confidence floor because the fastest way to earn trust is not posting weak findings in the first place.

By PR Quorum team

Every reviewer returns a self-reported confidence between 0 and 1. PR Quorum defaults to dropping anything below 0.75. People sometimes ask whether models can self-assess confidence reliably. The practical answer is simpler: a confidence floor is not a truth machine, but it is an extremely cheap way to stop weak findings from reaching a PR.

How to think about the floor

Low-confidence findings are often phrased like guesses: plausible, maybe interesting, and very expensive for a maintainer to verify. A PR review tool should be biased against making a human chase a maybe. The floor gives the aggregator a simple rule: if the reviewer is not confident enough, keep the finding in history and leave the PR alone.

What the floor changes

Lower the floor when you are exploring a new codebase and want more raw signal.
Keep the default when you want useful day-to-day review without extra noise.
Raise the floor on hot paths where only high-confidence findings deserve inline comments.
Pair the floor with max_inline_comments so a large PR cannot turn into a wall of bot text.

The default 0.75 floor is intentionally conservative. It is high enough to keep the review readable and low enough that the specialist reviewers can still surface issues worth human attention.

Caveats

Self-reported confidence is not magic. It is correlated with finding quality, not equivalent to it. We saw a small population of high-confidence wrong findings — confidently-claimed bugs that did not exist — and a smaller population of low-confidence right findings the floor would have dropped. The floor is a triage tool, not a truth oracle.

Confidence calibration also drifts when you swap models. We re-run the floor analysis whenever the default reviewer model changes. If you are using BYOK with a non-default model, the 0.75 default is a reasonable starting point but you should treat it as a knob.

The point is not that 0.75 is universally correct. The point is that a single scalar threshold, applied consistently, gives teams a simple lever they understand. Most noise-control systems fail because nobody knows how to tune them. A confidence floor is obvious enough to survive contact with a busy maintainer.

← Previous

Deduping reviewer findings without losing signal

AI review should be advisory by default

Try the reviewer panel on your next PR.

PR Quorum turns specialist reviewer output into one clean GitHub review, with noise controls and predictable usage caps.

Start free on GitHub View pricing