PRLens: Open Source AI Code Review You Actually Control

I’ve spent a fair amount of time looking for the right AI code review tool — tried a few, experimented seriously with some. None of them felt complete.

The problems were consistent. Hidden costs with no separation between LLM token spend and platform cost. Shallow control — you could tweak things at the edges but never quite shape the tool to how your team actually works. I wanted granular guidelines, reviewer profiles that could approve without needing write access, comments that matched the code structure and culture of the team. Small things individually, but together they add up to a tool that’s always slightly off.

The more time I spent with these tools, the more one thing became clear: the purpose of a code review tool is to serve the developers and teams using it, and that means they should own it. Not a vendor. Not a black box running on infrastructure you can’t inspect or control.

That’s what PRLens is built around.

The Harder Problem: The Review Isn’t Yours to Control

When you install a tool like CodeRabbit, you’re installing a GitHub App. Their servers receive the webhook. Their infrastructure runs the review. Their configuration determines what happens. You see the output — the comments on your PR — and nothing else.

That’s fine when everything works. It becomes a problem the moment you want to change anything.

A developer wants to understand why a specific comment was posted — what context the model saw, what prompt was used. There’s nowhere to look. A team wants to run the review locally before it posts publicly to a PR, to check whether it’s calibrated correctly for their codebase. There’s no mechanism for that. An engineer wants to scope the review to specific files, skip it for a hotfix branch, or tune the behaviour for a particular type of change. They can adjust a YAML config within the limits the vendor has chosen to expose — and that’s the ceiling.

The tool is not yours. It runs adjacent to your workflow, on infrastructure you don’t control, with behaviour you can observe but not meaningfully inspect or modify.

This matters beyond the obvious operational risks — the quota exhaustion, the opaque failures. It matters because code review is where your team’s standards live. The judgements being made about your code, the guidelines being applied, the patterns being flagged or ignored — these should sit under the same level of control as the rest of your engineering stack. Not delegated to a vendor’s black box.

The context problem is downstream of this. A diff reviewed without knowledge of the surrounding codebase produces generic, low-signal output: style suggestions, naming conventions, comments that would be reasonable on code the model had never seen before. But even if you recognise this and want to fix it — by injecting co-change history, by supplying architectural context, by tuning what the model sees — you can’t. The context pipeline isn’t yours to change.

The Problem Everyone Talks About: The Price

If your team is already paying for Claude or GPT-4o through your IDE, your enterprise agreement, or your internal tooling, you are paying again when a tool like CodeRabbit uses a model to review your PRs and buries that cost inside a per-seat subscription you didn’t negotiate. I’ve written about this pattern in detail here.

We had an Anthropic enterprise agreement already in place. We couldn’t route the reviewer through it. The tool uses its own API access, not yours — and there is no configuration that changes this. So we were paying for Claude twice: once through our agreement, and again implicitly through the per-seat subscription.

Then the reviews stopped. The tool had exhausted its own quota. We had no lever to pull — no way to increase limits, no way to fall back to our own account, no visibility into what had happened. We found out because a developer noticed the bot had gone quiet. That’s not an edge case. That’s what full dependency on someone else’s infrastructure looks like in practice.

At $24 per seat per month (Pro tier) with unlimited reviews, the math is doing a lot of quiet work. One developer pushing 20 PRs a day on a monorepo and another pushing 2 PRs a week on a microservice generate wildly different inference costs. The tool absorbs both into the same number, and you have no visibility into which provider ran, which model version, or how much of your subscription was product value versus token pass-through.

As inference costs keep dropping — and they will; the Epoch AI data shows a median decline of 50x per year — that opacity becomes a bigger problem, not a smaller one. Teams who’ve standardised on a provider, who’ve negotiated enterprise rates, who have compliance requirements around which models touch their code — they need to be able to see and control this layer. A bundled subscription can’t give them that.

What I Built: PRLens

I built PRLens — an open source AI PR reviewer built on few principles that address both problems directly.

Bring your own model. Your API key, your provider, your rate card. The tool works with Anthropic Claude or OpenAI GPT-4o, and the choice is a single config line. If your team is already paying for Claude, PRLens costs you nothing extra for the intelligence layer. The inference spend goes through your account, shows up in your usage dashboards, and sits under your data policies.

pip install 'prlens[anthropic]'
prlens init

prlens init handles the rest — creates the config, sets up team review history, generates the GitHub Actions workflow. No new vendor relationship required.

Actual codebase context, not just the diff. For every changed file, PRLens injects three signals into the review that a diff alone can’t provide:

The repository file tree at the PR’s exact head commit, so the model can reason about architecture layers, test coverage, and whether the changed code has a corresponding test file that wasn’t updated. The co-change history from git, which surfaces files that tend to move together — a proxy for architectural coupling that doesn’t show up in import statements. And the paired test file matched by filename pattern, giving the model context on what’s already covered versus what’s newly exposed.

This isn’t perfect. It’s not the same as a senior engineer who’s been in the codebase for two years. But it’s substantially better than reviewing a patch in isolation — and it’s the foundation for something much more capable.

Your guidelines, not a generic ruleset. The review instructions come from a Markdown file you write and own, checked into your repo. Your architecture decisions, your domain conventions, the things your team cares about that no generic style guide will ever capture.

# .prlens.yml
model: anthropic
guidelines: ./docs/guidelines.md

Incremental reviews. PRLens tracks what it’s already reviewed and only re-reviews files changed since the last run. If you push a fix commit, it reviews the fix — not the whole PR again.

Shadow mode for building trust. The --shadow flag runs a full review and prints every comment to your terminal without posting anything to GitHub. It’s the right way to onboard a sceptical team before the AI starts talking on their PRs.

No external service dependency. Tools like CodeRabbit install as a GitHub App — their servers receive your webhook, run the review, and post the result. When their service is degraded, your reviews stop. PRLens runs as a plain GitHub Action: it’s code that lives in your repo, executes in your CI pipeline, and has no runtime dependency on any third-party service. If it breaks, you can see why and fix it yourself.

Zero-infrastructure team history. Review history is stored in a private GitHub Gist by default. No server to run, no database to maintain. prlens init provisions it automatically. Every developer on the team can access shared history with no extra credentials — just gh auth login. (If your team has data residency requirements, the SQLite backend keeps everything local.)

Where This Needs to Go

PRLens has a solid foundation, but the gap between what it is today and what it should eventually be is real — and here’s where I think the most important work lies:

Language-aware context. Right now context signals come from git history and filename patterns, which is language-agnostic but coarse. The right next layer is symbol-aware context: when a function changes, fetch its callers. When a class is modified, surface its implementors. This requires tree-sitter parsing for each language and is a meaningful engineering investment — but it’s the difference between contextual and truly intelligent reviews.

Learned team patterns. The history store records what was reviewed. Using that history to improve future reviews — stopping repeated false positives, recognising reappeared anti-patterns, flagging files with a history of bugs — is the feature that makes the tool genuinely smarter over time rather than just consistent.

One-click fix suggestions. GitHub’s review API supports suggestion blocks that authors can apply directly. Converting comment suggestions into actionable GitHub suggestions would dramatically improve the rate at which comments get acted on — which is the only thing that ultimately matters.

More providers and models. Mistral, Gemini, local models via Ollama for teams with strict data policies. The provider abstraction is already there. It just needs implementations.

Better coverage for non-Python codebases. The core logic is language-agnostic, but the testing and tuning has been heavier on Python. Contributors who work primarily in Go, TypeScript, or Rust would bring both coverage and domain intuition that would improve the tool for those communities.

An Invitation

I open sourced PRLens because a tool that sits this close to your code and your team’s standards shouldn’t be a black box — and because the best version of it will come from developers who’ve felt the same friction and have ideas about how to fix it.

The repo is at github.com/prlens/prlens. It’s MIT licensed, has a proper CONTRIBUTING.md, and the issues list has specific areas where contributions would make the biggest difference.

If you want to try it: pip install 'prlens[anthropic]' and prlens init gets you running in under a minute.

The goal is a code review tool that evolves the way good developer tools do: through the collective experience of the developers who use it, shaping it toward the thing they actually need.

Open source. MIT licensed. Built for developers who want to own their review stack.