How It Works¶

pipguard's design has one rule: code must never execute during scanning.

Architecture¶

pipguard install X
       │
       ▼
pip download --prefer-binary X    ← downloads wheel/sdist, no code execution
       │
       ▼
Detect sdist fallback             ← exit 2 if sdist detected (unless --allow-sdist)
       │
       ▼
Extract archive (zipfile/tarfile) ← never executes code
       │
       ▼
AST scan all .py files            ← parallel, ThreadPoolExecutor
  setup.py, pyproject.toml, *.pth ← CRITICAL/HIGH scope
  all other .py                   ← MEDIUM/LOW scope
       │
       ▼
Risk scoring:
  CRITICAL → block (exit 1)
  HIGH     → block (exit 1)
  MEDIUM   → warn + confirm
  LOW      → warn + confirm
  CLEAN    → install silently
       │
       ▼
pip install --no-index            ← installs FROM SCANNED FILES (TOCTOU-safe)
    --find-links /tmp/pipguard-XX

Why Pre-Install?¶

Classical security tools (pip-audit, Safety, GuardDog) work post-hoc — they check installed packages against known-bad signature databases. This means:

Zero-day blind spot — a new attack not yet in the database walks straight through
Race condition — the malicious code has already run by the time the tool checks

pipguard reverses the order. It asks: does this code do something that any pip install should be allowed to do?

Regardless of whether the package is on any watchlist, the answer to "reads ~/.ssh/id_rsa and sends it over a network" is always no.

TOCTOU Safety¶

A subtle attack vector: scan a clean file, then swap it for a malicious one before install.

pipguard counters this by:

Downloading the archive to a temp directory
Scanning the files in place in that temp directory
Running pip install --no-index --find-links /tmp/pipguard-XX — installing the exact files that were scanned

The archive is never re-downloaded or re-extracted after scanning.

AST Scanning¶

pipguard uses Python's built-in ast module — no third-party dependencies — to parse .py files into abstract syntax trees and walk the nodes looking for dangerous patterns.

What gets flagged¶

CRITICALHIGHMEDIUMLOW

Pattern	Example
`.pth` file with executable Python	`import os; os.system(...)` in `.pth`
Obfuscated eval	`eval(base64.b64decode(...))`
Network in `setup.py` / install hooks	`urllib.request.urlopen(...)` in `setup.py`

Pattern	Example
Credential path read in install hooks	`open('~/.ssh/id_rsa')` in `setup.py`
Shell subprocess in install hooks	`subprocess.run(..., shell=True)`
`os.system()` / `os.popen()` in install hooks	`os.system('curl ...')`

Pattern	Example
Network in runtime code	`urllib.request.urlopen(...)` in `utils.py`
Sensitive env var access	`os.environ.get('AWS_SECRET_ACCESS_KEY')`

Pattern	Example
Dynamic imports	`importlib.import_module(name)`
`__import__()`	`__import__(variable)`

Seed Allowlist¶

Some packages legitimately access credentials as part of their core purpose. pipguard ships with a seed allowlist that reduces their finding from HIGH to MEDIUM (CRITICAL is never reduced):

keyring, keyrings.alt, boto3, botocore, awscli, paramiko, google-auth, google-cloud-storage, google-cloud-bigquery, google-cloud-core, azure-identity

Full allowlist reference →

Limitations¶

Phase 1 scope

These are known limitations of the current static-analysis approach.

Obfuscation — multi-layer obfuscation (e.g. exec(compile(...)) wrapped multiple times) may evade detection
C extensions — .so / .pyd binaries are opaque to AST scanning (flagged as UNKNOWN)
Python/pip only — no npm, cargo, or go module support
Phase 2 (in design) — seccomp/eBPF sandbox for capability-level interception at runtime