Pypi and Prejudice: A Dependency Confusion Love Story
Featuring typos, package takeovers, and a man who should be watching Bluey with his kids instead of registering fake Python packages.
TL;DR: find non-existent packages before CI does; block unknown names with a pre-commit and prefer a private index first.
awk -F'[#= ]' '{print $1}' requirements.txt | rg -v '^$' | pyspi
Use it: https://github.com/cybercdh/pyspi
The package was called py-sqllite3. Two Ls. One reality check. Nestled in a very serious project with badges and sponsors and everything. Normal adults report the typo and go to bed. I wrote a tool and then registered the typo so I could prove the point and then go to bed slightly smug.
Meet Nate. Nate types fast, lints hard, and, once, typed an extra L. Builds are obedient golden retrievers: you throw a package name and they bring back whatever smells right first. If there’s a public package matching your private one—or a typo someone squats—your CI will install it with the innocence of a Disney protagonist.
I built pyspi for this exact 2am gremlin energy: you hand it names, it asks PyPI if they exist, and then it tells you “nope” in a helpful tone.
Quick win:
# audit a requirements file for ghosts
awk -F'[#= ]' '{print $1}' requirements.txt | rg -v '^$' | pyspi
If you don’t want new tools: PyPI has a JSON API. Bring your own curl.
pkg=py-sqllite3
name=$(curl -sS https://pypi.org/pypi/$pkg/json | jq -r '.info.name' 2>/dev/null || echo "")
test "$name" = "null" -o -z "$name" && echo "$pkg: NOT_FOUND" || echo "$pkg: $name"
Dependency confusion in one paragraph: you meant “deliver to the private loading bay,” but the courier saw a public door with the same label and dropped your parcel there instead. Now your app depends on a box of rocks named py-sqllite3.
My harmless PoC looked like this:
# setup.py (retired)
from setuptools import setup
setup(
name='py-sqllite3', version='0.0.0', py_modules=['noop'],
install_requires=[],
)
# noop.py
print("[py-sqllite3] You installed a typo. Please use 'pysqlite3'.")
Someone installed it. Probably Nate. Maybe his CI. I unpublished, disclosed, and drank water like a responsible adult.
Take-home practices that make future-you sleep:
- Lock with hashes:
pip-compile --generate-hashesorpip install --require-hashes -r requirements.txt. - Pre-commit guard: block unknown names.
# .pre-commit-config.yaml
- repo: local
hooks:
- id: pypi-exists
name: Disallow unknown PyPI packages
entry: bash -c 'awk -F"[#= ]" "{print $1}" requirements.txt | rg -v "^$" | while read p; do curl -fsS https://pypi.org/pypi/$p/json >/dev/null || { echo "unknown: $p"; exit 1; }; done'
language: system
- Audit:
pip-auditoruv pip checkfor vulnerability and resolution sanity. - Namespacing: private indexes first; use
pip.confwith--index-urland--extra-index-urlordering so “private wins”.
# pip.conf
[global]
index-url = https://pypi.mycorp.local/simple
extra-index-url = https://pypi.org/simple
And yes, use pyspi when you’re triaging a stray requirements.txt from the wilderness. It’s faster than arguing with Nate about whether “it worked on my machine.”
Moral: the difference between “solid supply chain” and “lol, oops” is one letter and two guardrails. Choose both.
Copy me:
# Minimal audit + fix sketch
awk -F'[#= ]' '{print $1}' requirements.txt | rg -v '^$' | pyspi | tee missing.txt
git grep -nF "py-sqllite3" | tee todo-fixes.txt
Report snippet (to maintainer or internal team):
Title: Non-existent package referenced in requirements (dependency confusion risk)
Repo: org/repo
Package(s): py-sqllite3 (NOT_FOUND on PyPI)
Evidence:
- pyspi/JSON query indicates package does not exist
Impact: Public registration could cause CI to install attacker-controlled package.
Recommended fix: Correct package name (e.g., pysqlite3), lock with hashes, and prefer private index first in pip.conf.
Actions taken: No package registration or exploitation; disclosure only.