Whitelist vs blacklist at simple parsers

Beginner Defensive APIs: validation, sanitization & exceptions
Created by Pavel · 29.04.2026 at 19:10 UTC

A whitelist allows only known-good characters ([a-z0-9_-] for slugs). A blacklist bans known-bad tokens and is endless—new Unicode homoglyphs, zero-width spaces, and confusables appear constantly.

For dataset IDs, column name normalisation, and MLflow experiment names, pick a small alphabet and reject everything else. Heavy-duty text often needs ICU or dedicated libraries; for teaching kernels, explicit sets are enough.

Unicode identifiers PEP: [1].


Sources

University approvals: 0
Tasks
Question 1

For a slug that must match [a-z0-9_-]+, which strategy scales better as Unicode attack surfaces grow?

Hint

Allow known-good, not “ban known-bad”.

Question 2

Why is eval(user_input) unacceptable for parsing safe identifiers?

Hint

Never execute user strings.

Question 3

is_slug(s): non-empty and every char in ascii lowercase, digits, _, - (use string module). No regex required.

Hint

Set membership per character.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Defensive APIs: validation, sanitization & exceptions
  • Difficulty: Beginner
  • Completed: 0 users
Creator
Pavel
Pavel