Skip to content

Benchmarks

Estimated cost and performance for representative RepoKeeper operations. All figures are approximate and based on real-world usage with DeepSeek models (the default). Actual results vary with repository size, issue complexity, and model choice.

Implementation Agent

Cost to generate and verify a PR from an issue. Measured end-to-end: context collection → LLM call(s) → verification → PR creation.

Scenario Files in context Prompt tokens Completion tokens Total tokens Cost (DeepSeek‑Chat) Duration
Simple fix (1 file, typo) 8 3,200 800 4,000 < $0.001 ~15 s
Medium fix (3 files, bug) 25 12,000 2,500 14,500 ~$0.002 ~30 s
Feature (5 files, new test) 40 22,000 5,000 27,000 ~$0.004 ~50 s
Complex (10+ files, refactor) 60 40,000 8,000 48,000 ~$0.007 ~90 s
Large (max context, big repo) 60 55,000 8,000 63,000 ~$0.009 ~120 s

Notes: - "Duration" is wall-clock time from issue trigger to PR open, including context collection, LLM streaming, verification, and git operations. - Verification (lint + test) time varies widely with project size and is not included in these estimates. - DeepSeek-Reasoner costs ~4× more per token but can handle significantly more complex issues in a single pass, reducing retry attempts.

Two-step smart selection savings

When agent.smart_file_selection is enabled (default), RepoKeeper sends a file listing first, lets the LLM pick ~10–30 relevant files, and only then reads their content. This cuts context tokens by 40–70% compared to direct "send everything" collection, reducing both cost and latency.

Strategy Files read Context tokens Cost
Direct (60 files) 60 ~55,000 ~$0.009
Two-step (LLM picks 15) 15 ~18,000 ~$0.003

Code Review

Cost to review a pull request with inline comments.

PR size Files changed Diff lines Prompt tokens Completion tokens Cost (DeepSeek‑Chat) Duration
Small (1 file) 1 20 5,000 1,500 ~$0.001 ~10 s
Medium (5 files) 5 150 15,000 3,000 ~$0.003 ~25 s
Large (15 files) 15 500 35,000 6,000 ~$0.007 ~50 s

Radar Scan

Cost to scan recent issues and discussions for keyword matches.

Items scanned Hits classified Prompt tokens Cost (DeepSeek‑Chat) Duration
30 issues 5 8,000 ~$0.001 ~12 s
50 issues 12 18,000 ~$0.003 ~25 s
100 issues 30 40,000 ~$0.006 ~50 s

Model cost comparison

Approximate cost for a medium implementation (14,500 tokens total).

Model Input price Output price Cost for 14.5K tokens Relative
deepseek-chat $0.14/M $0.28/M ~$0.002 1× (baseline)
deepseek-reasoner $0.55/M $2.19/M ~$0.010
gpt-4o $2.50/M $10.00/M ~$0.039 20×
claude-sonnet-4 $3.00/M $15.00/M ~$0.053 27×
claude-3.5-haiku $0.80/M $4.00/M ~$0.014

Recommendation: DeepSeek-Chat is the best price/performance default for routine PRs. Use DeepSeek-Reasoner for complex issues that require multi-step reasoning, and Claude/GPT-4o when correctness is critical and cost is secondary.

Real-world observations

These figures come from running RepoKeeper on its own repository (dogfooding) and a handful of community repositories.

  • Average PR cost: $0.004 (range: $0.0003 – $0.012)
  • Verification fix loop retries: ~15% of PRs need 1 retry; < 5% need 2.
  • Smart file selection: Reduces context tokens by ~55% on average.
  • Incremental re-review: Costs ~60% of a full review (diffs are shorter, less context needed).

Benchmarks last updated: 2026-05-10. Run repokeeper pricing to check whether pricing data is still current.