Whitelist vs blacklist at simple parsers
Beginner
Defensive APIs: validation, sanitization & exceptions
Created by Pavel
· 29.04.2026 at 19:10 UTC
A whitelist allows only known-good characters ([a-z0-9_-] for slugs). A blacklist bans known-bad tokens and is endless—new Unicode homoglyphs, zero-width spaces, and confusables appear constantly.
For dataset IDs, column name normalisation, and MLflow experiment names, pick a small alphabet and reject everything else. Heavy-duty text often needs ICU or dedicated libraries; for teaching kernels, explicit sets are enough.
Unicode identifiers PEP: [1].
Sources
University approvals: 0
Tasks
Card Info
- Topic: Defensive APIs: validation, sanitization & exceptions
- Difficulty: Beginner
- Completed: 0 users
Creator
Pavel