Python as Lingua Franca for Data Scientists
Intermediate
Python for Data Science
by Best
Learn Python as the working language of data science, from the ground up: tables, types and reading files; vectors and matrices; NumPy and pandas; plotting and data validation; how to evaluate a model honestly with the right metrics and an eye on uncertainty; modelling with scikit-learn; and the professional toolkit (generators, recursion, decorators, classes, graphs and reproducible environments), ending in a reproducible end-to-end project. Each topic is a module of short, focused lessons. No prior Python required.
University approvals: 1
(ZHAW - Zürcher Hochschule für Angewandte Wissenschaften: 1)
How a program holds a dialogue with you, turning typed text into numbers, Python's basic types, and how a table of records looks before any library.
Why a column's type is a statement about meaning, parsing strings into the right type, and packaging that work in a named function.
Treating a column of numbers as a vector and doing the arithmetic by hand, to feel the cost before NumPy.
Two-dimensional data as a list of rows, and the nested loops that traverse it.
Functions as the organising unit of a project: typed, documented, composable transforms.
Split-by-key-then-reduce in plain Python with dict, Counter, and defaultdict.
The ndarray -- one typed, shaped object that replaces nested lists and loops.
Combining arrays of different shapes without loops, and using conditions as filters -- the core analyst pattern.
The table object analysts live in: named columns, filtering, grouped aggregation, and joins.
Looking at data before modelling: the matplotlib figure/axes model and choosing the right chart.
The boundary of your program: parsing JSON, handling missing values, and validating untrusted input.
What a good prediction is, the confusion matrix, the precision/recall trade-off, and why accuracy lies for rare classes.
Every metric is an estimate: confidence intervals, sample size, and the traps that make 'wins' fail to replicate.
The uniform fit/predict interface, the train/test split, and the leakage that quietly inflates scores.
Making slow code fast the disciplined way: measure first, then vectorise, then reach for Numba or a better algorithm.
Processing data that does not fit in memory, one item or one chunk at a time, with yield.
Walking hierarchical data with recursion and with an explicit stack, and why CPython's recursion limit matters.
Wrapping behaviour around a function without changing it -- timing, logging, and caching pure computations.
Using a class to gather a pipeline's settings into one object that validates itself and fails loudly and early.
Modelling relationships as graphs with NetworkX, and pinning an environment so others can re-run your work.
Every topic at once: from a raw file to a defended, reproducible conclusion.