Lazy evaluation and yield
Until now you've held whole datasets in memory. But some data doesn't fit — a file larger than your RAM, a never-ending stream. The answer is lazy evaluation: produce values only as they're asked for, instead of building a giant list up front.
The tool is the generator. Where a normal function computes everything and returns a finished list, a generator hands back one value at a time, on demand, keeping memory flat no matter how long the stream is.
The keyword that makes a generator is yield. A function that yields hands back one value, then pauses, remembering exactly where it was, and resumes from there on the next request:
def squares(n):
for i in range(n):
yield i * i
for s in squares(5):
print(s) # 0 1 4 9 16, produced one at a time
Nothing is computed until the loop asks, and only one value exists at a time. A generator keeps its own state between requests, which is what makes it perfect for streaming.
“Chunked reading and single-pass generators”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Advanced
- Completed: 0 users