Memoization with lru_cache, safely
The most useful built-in decorator for data work is lru_cache, which memoizes: it remembers the result for each set of arguments, so repeated calls with the same input are instant lookups instead of repeated computation. It turns the exponentially wasteful naive Fibonacci into linear work:
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
return n if n < 2 else fib(n - 1) + fib(n - 2)
fib(35) now returns immediately instead of recomputing the same subproblems billions of times, and the same trick speeds up any expensive computation that's called repeatedly with the same inputs.
Caching is only correct when a function is pure: it always returns the same result for the same arguments and has no side effects. Caching something that depends on the current time, a database, or random state is a bug that hides until the cache hands back something stale.
lru_cache additionally requires the arguments to be hashable, so you can't memoize a function that takes a list or dict as an argument. Used on genuinely pure functions with hashable arguments, though, memoization is one of the cheapest large speed-ups available.
and leads into “A config class that validates itself”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Advanced
- Completed: 0 users