Normalising calculators and tokens
Beginner
Defensive APIs: validation, sanitization & exceptions
Created by Pavel
· 29.04.2026 at 19:10 UTC
Human-entered expressions mix Unicode spaces, x for multiplication, and locale-specific decimals. Sanitisation canonicalises representation: strip, fold case where appropriate, replace x with *, collapse whitespace. Validation then checks grammatical rules (alternating operands and operators) on the cleaned string.
Order matters: validating length before stripping can accept ' ok ' incorrectly or reject valid short tokens—apply trimming first, then rules, as in the V06 boundary-first principle.
For ML feature engineering, the same pattern applies to messy categorical strings before hashing or embedding.
str.strip / str.replace docs: [1].
Sources
University approvals: 0
Tasks
Card Info
- Topic: Defensive APIs: validation, sanitization & exceptions
- Difficulty: Beginner
- Completed: 0 users
Creator
Pavel