# PR Quorum — Full content for LLM ingestion

A configurable panel of AI reviewers, convened on every pull request. Correctness, Security, and Architecture run in parallel through OpenRouter, return findings validated against a Zod schema, and the aggregator dedupes and posts a single review back to GitHub.

Reviews are advisory by default — posted with `event: COMMENT`, never `request_changes`. Findings below `min_confidence` (default 0.75) are dropped before the review is posted. Inline comments are capped at `max_inline_comments` (default 10) per review and only attached to lines that map to a unified-diff position.

---

# Blog posts

## Why three specialist reviewers beat one generic bot

Source: https://prquorum.com/blog/three-reviewers-one-review
Date: 2026-05-02
Category: Product
Reading time: 6 min
Author: PR Quorum team

> A single “review this PR” prompt gets distracted. PR Quorum splits review into Correctness, Security, and Architecture so each reviewer can stay sharp and the aggregator can keep the final comment clean.

The obvious shape for an AI code reviewer is one prompt, one model, one pass over the diff. It looks tidy in a demo. In real repos, it gets distracted. A generic "review this PR" prompt asks the model to juggle bugs, security, design taste, tests, framework conventions, and style at the same time. The loudest change tends to win.

## What "one big reviewer" actually does

On a refactor PR, a single reviewer often notices naming, extracted helpers, and maintainability shape. That is useful, but it can crowd out the off-by-one in a loop bound or the unsafe input that crosses into a query builder two files later. Those need a different stance: adversarial, runtime-first, and less impressed by tidy abstractions.

## What changes with three

- Each reviewer has a focus list and a stance. Correctness argues backward from runtime failure modes. Security focuses on data flow and trust boundaries. Architecture looks for convention drift and unnecessary complexity.
- Findings are JSON, validated against a Zod schema, with severity and confidence. The aggregator does the cross-reviewer work — dedup, sort, truncate — instead of asking the model to do it.
- Confidence below the floor never reaches the PR. We default to 0.75. It is the cheapest noise filter we have and the one with the largest effect on perceived quality.

## What we did not expect

The biggest product lesson is that maintainers do not want three separate AI review posts. The dedup-and-aggregate step matters as much as the parallel fan-out. Two reviewers flagging the same line is a strong signal; we sort it toward the top. Three reviewers each picking different battles can become overwhelming, so PR Quorum caps inline comments and keeps the rest in run history.

> A panel beats a generalist when the work splits cleanly into specialities. Code review does. Most other reasoning tasks do not — be careful about copy-pasting this pattern.

If you are building reviewer tooling: start by writing the focus lists, not the prompts. The prompts fall out of the focus lists, and the focus lists are what your maintainers will actually argue about.

---

## Deduping reviewer findings without losing signal

Source: https://prquorum.com/blog/aggregator-dedup-and-diff-positions
Date: 2026-04-18
Category: Engineering
Reading time: 9 min
Author: PR Quorum team

> How we sort by severity, dedupe by (file, line, lowercased title), and only post inline comments on lines that map to a unified-diff position. Plus: what we threw out and why.

Three reviewers in parallel produce three lists of findings. Naively merging them gets you duplicates, inconsistent severities, and inline comments on lines GitHub will refuse to attach. The aggregator does the boring middle work that turns three model outputs into one usable review.

## The dedup key

We dedupe by (file, line, lowercased title). We tried fancier things — embedding similarity, suggestion overlap, semantic fingerprints — and they all lost more than they gained. Two reviewers flagging the same line with similar phrasing is the single strongest cross-reviewer signal we have. We do not want to merge it away by accident.

```ts
function dedupKey(f: Finding) {
  return [f.file, f.line, f.title.toLowerCase()].join('::');
}
```

When two findings collide on the key, we keep the one with the higher severity, then the higher confidence on tiebreak. We attribute the kept finding to whichever reviewer sent it; the dropped reviewer's id is logged but not posted. Maintainers do not need to know that two AIs agreed; they need to read one comment.

## Severity rank, not severity strings

Sorting by the string severity is a footgun. We map to integers (critical=4, high=3, medium=2, low=1), sort descending, and use confidence as the tiebreak. The first 10 findings after sort and dedup are the inline comments; the rest survive in the database as run history.

## The diff position trap

GitHub will only attach an inline comment to a line that exists as a position in the unified diff. That is not "any line in the file" — it is specifically the line numbers GitHub assigns to + and context lines inside @@ hunks. A reviewer can confidently flag line 412 of a 600-line file, and if line 412 was not in the diff, the comment will fail to post.

We solved this by parsing the patch in mapPatchLineToPosition and walking the @@ hunks ourselves. Every finding gets the position it would map to; the ones that do not map are logged in review_findings but stripped from the inline post. The summary at the top of the review still mentions the count so nothing is silently dropped.

## What we threw out

- Embedding-based dedup. Cost more, deduped less, occasionally collapsed two genuinely different findings.
- Per-reviewer caps before merge. Made the aggregator unstable when one reviewer was unusually quiet — better to cap once, after dedup.
- Asking the model to dedupe in a final pass. Worked sometimes, hallucinated finding text other times. Boring code beat it every time.

The aggregator is now under 200 lines and shipping for months without changes. That is mostly because the dedup key is dumb and the severity rank is an integer. The smartest version of this pipeline lives in the reviewers, not in the post-processing.

---

## A confidence floor is the cheapest noise filter you have

Source: https://prquorum.com/blog/min-confidence-floor
Date: 2026-04-04
Category: Research
Reading time: 4 min
Author: PR Quorum team

> PR Quorum defaults to a 0.75 confidence floor because the fastest way to earn trust is not posting weak findings in the first place.

Every reviewer returns a self-reported confidence between 0 and 1. PR Quorum defaults to dropping anything below 0.75. People sometimes ask whether models can self-assess confidence reliably. The practical answer is simpler: a confidence floor is not a truth machine, but it is an extremely cheap way to stop weak findings from reaching a PR.

## How to think about the floor

Low-confidence findings are often phrased like guesses: plausible, maybe interesting, and very expensive for a maintainer to verify. A PR review tool should be biased against making a human chase a maybe. The floor gives the aggregator a simple rule: if the reviewer is not confident enough, keep the finding in history and leave the PR alone.

## What the floor changes

- Lower the floor when you are exploring a new codebase and want more raw signal.
- Keep the default when you want useful day-to-day review without extra noise.
- Raise the floor on hot paths where only high-confidence findings deserve inline comments.
- Pair the floor with max_inline_comments so a large PR cannot turn into a wall of bot text.

The default 0.75 floor is intentionally conservative. It is high enough to keep the review readable and low enough that the specialist reviewers can still surface issues worth human attention.

## Caveats

Self-reported confidence is not magic. It is correlated with finding quality, not equivalent to it. We saw a small population of high-confidence wrong findings — confidently-claimed bugs that did not exist — and a smaller population of low-confidence right findings the floor would have dropped. The floor is a triage tool, not a truth oracle.

Confidence calibration also drifts when you swap models. We re-run the floor analysis whenever the default reviewer model changes. If you are using BYOK with a non-default model, the 0.75 default is a reasonable starting point but you should treat it as a knob.

The point is not that 0.75 is universally correct. The point is that a single scalar threshold, applied consistently, gives teams a simple lever they understand. Most noise-control systems fail because nobody knows how to tune them. A confidence floor is obvious enough to survive contact with a busy maintainer.

---

## AI review should be advisory by default

Source: https://prquorum.com/blog/advisory-not-blocking
Date: 2026-03-19
Category: Product
Reading time: 5 min
Author: PR Quorum team

> Why every PR Quorum review is posted with event:COMMENT — never request_changes — and how the advisory framing changes how teams actually use the panel.

PR Quorum reviews are posted with event:COMMENT, the system prompt asks for findings rather than demands, and the verdict at the top of the summary is a label. This is deliberate: the bot should earn attention before it earns power.

## Why it matters

A blocking AI reviewer puts the model in the merge path. That is a place where false positives are very expensive: every wrong "request changes" makes a human do work, and the human has no good way to disagree except to override the bot. After two or three of those, the team learns to ignore it. After ten, they disable it.

Advisory framing inverts the dynamic. A useful comment is read; a useless one is dismissed; nothing about the merge changes. The bot earns trust by being right often enough to be worth reading, not by being load-bearing.

## How the framing shows up in code

- Reviews are submitted with event:COMMENT, not REQUEST_CHANGES. The GitHub API will let you do either; we only do one.
- The reviewer system prompt forbids "request changes" verbs and asks for findings phrased as observations, not demands.
- The summary footer reminds maintainers that the panel is advisory and humans decide merge.
- The verdict (clean / minor / needs_attention) is shown as a label so maintainers get a signal without losing control.

## What advisory does not mean

It does not mean low-effort, low-confidence, or "it is just a suggestion so we do not have to be careful." The opposite, actually — because the bot cannot block merge, it has to earn its place by being signal-dense. The min_confidence floor, the dedup, and the inline-comment cap all exist because advisory framing forces us to be ruthless about what reaches the PR.

Some teams eventually want enforcement around specific security or release rules. That should be explicit, narrow, and owned by the team. The default PR Quorum review path stays advisory because trust comes before enforcement.

---