Why look first, and the figure/axes model
Exploratory data analysis (EDA) is the habit of plotting before modelling, because a picture reveals skew, outliers, gaps and structure that no single summary number will. A column with an average of 50 might be tightly clustered there or split evenly between 0 and 100 — only a plot tells you which.
Most modelling mistakes are visible in a five-minute EDA pass: a long tail that breaks a mean, a clump of impossible values, a column that is half missing. Looking first is cheaper than debugging a model that silently trained on dirty data.
matplotlib gives you explicit control through two objects: a figure (the canvas) and an axes (the plot inside it). You draw onto the axes, label it, and save it:
import matplotlib
matplotlib.use("Agg") # render to a file; no screen needed
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(values, bins=20)
ax.set_xlabel("score")
ax.set_ylabel("count")
ax.set_title("Score distribution")
fig.savefig("dist.png", dpi=120)
Always label the axes and give the plot a title — an unlabelled chart isn't evidence anyone can act on.
leads into “Choosing charts and headless plotting”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users