Python as Lingua Franca for Data Scientists

Intermediate Python for Data Science

by Best

Learn Python as the working language of data science, from the ground up: tables, types and reading files; vectors and matrices; NumPy and pandas; plotting and data validation; how to evaluate a model honestly with the right metrics and an eye on uncertainty; modelling with scikit-learn; and the professional toolkit (generators, recursion, decorators, classes, graphs and reproducible environments), ending in a reproducible end-to-end project. Each topic is a module of short, focused lessons. No prior Python required.

University approvals: 1 (ZHAW - Zürcher Hochschule für Angewandte Wissenschaften: 1)

How a program holds a dialogue with you, turning typed text into numbers, Python's basic types, and how a table of records looks before any library.

Why a column's type is a statement about meaning, parsing strings into the right type, and packaging that work in a named function.

Treating a column of numbers as a vector and doing the arithmetic by hand, to feel the cost before NumPy.

Two-dimensional data as a list of rows, and the nested loops that traverse it.

Functions as the organising unit of a project: typed, documented, composable transforms.

Type hints and a numeric helper Card
Project paths and feature pipelines Card

Split-by-key-then-reduce in plain Python with dict, Counter, and defaultdict.

Counting and accumulating per key Card
Ranking, ties, and toward groupby Card

The ndarray -- one typed, shaped object that replaces nested lists and loops.

Combining arrays of different shapes without loops, and using conditions as filters -- the core analyst pattern.

The table object analysts live in: named columns, filtering, grouped aggregation, and joins.

Looking at data before modelling: the matplotlib figure/axes model and choosing the right chart.

The boundary of your program: parsing JSON, handling missing values, and validating untrusted input.

What a good prediction is, the confusion matrix, the precision/recall trade-off, and why accuracy lies for rare classes.

Every metric is an estimate: confidence intervals, sample size, and the traps that make 'wins' fail to replicate.

The uniform fit/predict interface, the train/test split, and the leakage that quietly inflates scores.

Making slow code fast the disciplined way: measure first, then vectorise, then reach for Numba or a better algorithm.

Measure first, then vectorise Card
Numba and better algorithms Card

Processing data that does not fit in memory, one item or one chunk at a time, with yield.

Lazy evaluation and yield Card
Chunked reading and single-pass generators Card

Walking hierarchical data with recursion and with an explicit stack, and why CPython's recursion limit matters.

Wrapping behaviour around a function without changing it -- timing, logging, and caching pure computations.

Using a class to gather a pipeline's settings into one object that validates itself and fails loudly and early.

Modelling relationships as graphs with NetworkX, and pinning an environment so others can re-run your work.

Every topic at once: from a raw file to a defended, reproducible conclusion.

Python as Lingua Franca for Data Scientists

1. Talking to Python: print, input, and basic types 2 items

2. Types as measurement scales, and your first function 2 items

3. Lists as vectors: numeric work with loops 2 items

4. Matrices as nested lists 2 items

5. Functions, type hints, and feature helpers 2 items

6. Grouping and aggregating without pandas 2 items

7. NumPy arrays: dtype, shape, and slicing 2 items

8. NumPy broadcasting and boolean masks 2 items

9. pandas: DataFrames, groupby, and joins 2 items

10. Plotting for exploratory data analysis 2 items

11. JSON, validation, and missing data 2 items

12. Judging a classifier: precision, recall, and F1 2 items

13. Uncertainty, A/B testing, and multiple comparisons 2 items

14. scikit-learn: the estimator API and the split 2 items

15. Profiling and speeding up hot code 2 items

16. Generators and lazy, chunked streams 2 items

17. Stacks, recursion, and trees 2 items

18. Decorators and memoization 2 items

19. Object-oriented configuration 2 items

20. Graphs and reproducible environments 2 items

21. Capstone: a reproducible end-to-end project 2 items