Whitelist vs blacklist at simple parsers

Beginner Defensive APIs: validation, sanitization & exceptions

Created by Pavel · 29.04.2026 at 19:10 UTC

A whitelist allows only known-good characters ([a-z0-9_-] for slugs). A blacklist bans known-bad tokens and is endless—new Unicode homoglyphs, zero-width spaces, and confusables appear constantly.

For dataset IDs, column name normalisation, and MLflow experiment names, pick a small alphabet and reject everything else. Heavy-duty text often needs ICU or dedicated libraries; for teaching kernels, explicit sets are enough.

Unicode identifiers PEP: [1].

Sources

[1]https://peps.python.org/pep-3131/Return to text

University approvals: 0

Tasks

Question 1

For a slug that must match [a-z0-9_-]+, which strategy scales better as Unicode attack surfaces grow?

Hint

Allow known-good, not “ban known-bad”.

Blacklist every punctuation character you can think of

Whitelist the small intended alphabet and reject all else

Disable validation—users are trusted

Use eval to normalise input

Question 2

Why is eval(user_input) unacceptable for parsing safe identifiers?

Hint

Never execute user strings.

eval is O(n)

eval executes arbitrary code—no boundary safety

eval only works on integers

eval requires numpy

Question 3

is_slug(s): non-empty and every char in ascii lowercase, digits, _, - (use string module). No regex required.

Hint

Set membership per character.

import string

_ALLOWED = set(string.ascii_lowercase + string.digits + '_-')


def is_slug(s: str) -> bool:
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Defensive APIs: validation, sanitization & exceptions
Difficulty: Beginner
Completed: 0 users

Creator

Pavel