Benchmarks¶
Estimated cost and performance for representative RepoKeeper operations. All figures are approximate and based on real-world usage with DeepSeek models (the default). Actual results vary with repository size, issue complexity, and model choice.
Implementation Agent¶
Cost to generate and verify a PR from an issue. Measured end-to-end: context collection → LLM call(s) → verification → PR creation.
| Scenario | Files in context | Prompt tokens | Completion tokens | Total tokens | Cost (DeepSeek‑Chat) | Duration |
|---|---|---|---|---|---|---|
| Simple fix (1 file, typo) | 8 | 3,200 | 800 | 4,000 | < $0.001 | ~15 s |
| Medium fix (3 files, bug) | 25 | 12,000 | 2,500 | 14,500 | ~$0.002 | ~30 s |
| Feature (5 files, new test) | 40 | 22,000 | 5,000 | 27,000 | ~$0.004 | ~50 s |
| Complex (10+ files, refactor) | 60 | 40,000 | 8,000 | 48,000 | ~$0.007 | ~90 s |
| Large (max context, big repo) | 60 | 55,000 | 8,000 | 63,000 | ~$0.009 | ~120 s |
Notes: - "Duration" is wall-clock time from issue trigger to PR open, including context collection, LLM streaming, verification, and git operations. - Verification (lint + test) time varies widely with project size and is not included in these estimates. - DeepSeek-Reasoner costs ~4× more per token but can handle significantly more complex issues in a single pass, reducing retry attempts.
Two-step smart selection savings¶
When agent.smart_file_selection is enabled (default), RepoKeeper sends a
file listing first, lets the LLM pick ~10–30 relevant files, and only then
reads their content. This cuts context tokens by 40–70% compared to
direct "send everything" collection, reducing both cost and latency.
| Strategy | Files read | Context tokens | Cost |
|---|---|---|---|
| Direct (60 files) | 60 | ~55,000 | ~$0.009 |
| Two-step (LLM picks 15) | 15 | ~18,000 | ~$0.003 |
Code Review¶
Cost to review a pull request with inline comments.
| PR size | Files changed | Diff lines | Prompt tokens | Completion tokens | Cost (DeepSeek‑Chat) | Duration |
|---|---|---|---|---|---|---|
| Small (1 file) | 1 | 20 | 5,000 | 1,500 | ~$0.001 | ~10 s |
| Medium (5 files) | 5 | 150 | 15,000 | 3,000 | ~$0.003 | ~25 s |
| Large (15 files) | 15 | 500 | 35,000 | 6,000 | ~$0.007 | ~50 s |
Radar Scan¶
Cost to scan recent issues and discussions for keyword matches.
| Items scanned | Hits classified | Prompt tokens | Cost (DeepSeek‑Chat) | Duration |
|---|---|---|---|---|
| 30 issues | 5 | 8,000 | ~$0.001 | ~12 s |
| 50 issues | 12 | 18,000 | ~$0.003 | ~25 s |
| 100 issues | 30 | 40,000 | ~$0.006 | ~50 s |
Model cost comparison¶
Approximate cost for a medium implementation (14,500 tokens total).
| Model | Input price | Output price | Cost for 14.5K tokens | Relative |
|---|---|---|---|---|
| deepseek-chat | $0.14/M | $0.28/M | ~$0.002 | 1× (baseline) |
| deepseek-reasoner | $0.55/M | $2.19/M | ~$0.010 | 5× |
| gpt-4o | $2.50/M | $10.00/M | ~$0.039 | 20× |
| claude-sonnet-4 | $3.00/M | $15.00/M | ~$0.053 | 27× |
| claude-3.5-haiku | $0.80/M | $4.00/M | ~$0.014 | 7× |
Recommendation: DeepSeek-Chat is the best price/performance default for routine PRs. Use DeepSeek-Reasoner for complex issues that require multi-step reasoning, and Claude/GPT-4o when correctness is critical and cost is secondary.
Real-world observations¶
These figures come from running RepoKeeper on its own repository (dogfooding) and a handful of community repositories.
- Average PR cost: $0.004 (range: $0.0003 – $0.012)
- Verification fix loop retries: ~15% of PRs need 1 retry; < 5% need 2.
- Smart file selection: Reduces context tokens by ~55% on average.
- Incremental re-review: Costs ~60% of a full review (diffs are shorter, less context needed).
Benchmarks last updated: 2026-05-10. Run repokeeper pricing to check
whether pricing data is still current.