Normalising calculators and tokens

Beginner Defensive APIs: validation, sanitization & exceptions
Created by Pavel · 29.04.2026 at 19:10 UTC

Human-entered expressions mix Unicode spaces, x for multiplication, and locale-specific decimals. Sanitisation canonicalises representation: strip, fold case where appropriate, replace x with *, collapse whitespace. Validation then checks grammatical rules (alternating operands and operators) on the cleaned string.

Order matters: validating length before stripping can accept ' ok ' incorrectly or reject valid short tokens—apply trimming first, then rules, as in the V06 boundary-first principle.

For ML feature engineering, the same pattern applies to messy categorical strings before hashing or embedding.

str.strip / str.replace docs: [1].


Sources

University approvals: 0
Tasks
Question 1

You must validate that a hand-typed formula alternates operands and operators. Should you strip whitespace before or after checking token length rules?

Hint

Sanitise then validate.

Question 2

Replacing multiplication symbol x with * before tokenising is primarily:

Hint

Transform, not reject.

Question 3

canon_calc(s: str) -> str: replace lowercase x with *; strip ends; collapse internal runs of spaces to single spaces (no regex).

Hint

Manual space collapse after replace.

Starter code is prefilled; replace TODO blocks with your solution.
1 test case will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Defensive APIs: validation, sanitization & exceptions
  • Difficulty: Beginner
  • Completed: 0 users
Creator
Pavel
Pavel