Why vectorised kernels beat naive Python loops numerically hot
Beginner
Accelerating numerics & developer hygiene
Created by Pavel
· 29.04.2026 at 19:11 UTC
NumPy (and pandas underneath) stores numeric data in contiguous, typed buffers and dispatches work to C/Fortran/BLAS loops. A Python for loop over a million floats pays per-iteration interpreter overhead and poor locality; np.sum(x) touches memory in tight machine code.
Amdahl’s law still applies: if 95% of time is I/O or Python orchestration, vectorising the inner 5% barely helps. Profile first, then vectorise the actual hotspot.
Broadcasting expresses outer operations without explicit Python nests—essential for feature matrices and batch norms.
NumPy quickstart: [1].
Sources
University approvals: 0
Tasks
Card Info
- Topic: Accelerating numerics & developer hygiene
- Difficulty: Beginner
- Completed: 0 users
Creator
Pavel