Educational Cards
Learn from video content, text, and interactive tasks
Filters
Encoder vs decoder masking
Not every transformer attends the same way. Encoder blocks (BERT-style) use bidirectional...
Transformer block: attention + MLP + residuals + norm
A transformer block stacks two sublayers around a residual highway. First, multi-head...
Positional encodings and length generalization
Attention scores depend on content vectors alone unless you tell the model where each token sits in...
Self-attention mixes tokens via learned compatibilities
Chapter 5 framed language as next-token prediction; this chapter asks how a model can mix...
Bridge to transformer mechanisms
Language models before transformers often used RNNs : a hidden state updated serially along time,...
Memorization, privacy, and copyright pressure points
Large models can memorize rare training sequences, including private or licensed text....
Profiling before rewriting
cProfile , py-spy , and IDE profilers show where time goes—function call counts and cumulative...
Linters versus formatters
Formatters (Black, isort) rewrite layout—whitespace, line breaks, import order—without changing...
Docker reproducibility anecdotes
Containers pin OS-level dependencies and, with lock files, Python package versions so collaborators...
Testing discipline vs folklore speedups
Unit tests lock behaviour while you refactor: changing a feature encoder should not silently shift...
A simple timing decorator (wraps-preserving ergonomics)
Decorators that add timing/logging should use functools.wraps so __name__ , __doc__ , and...
Why vectorised kernels beat naive Python loops numerically hot
NumPy (and pandas underneath) stores numeric data in contiguous, typed buffers and dispatches work...