Why look first, and the figure/axes model

Intermediate Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

Exploratory data analysis (EDA) is the habit of plotting before modelling, because a picture reveals skew, outliers, gaps and structure that no single summary number will. A column with an average of 50 might be tightly clustered there or split evenly between 0 and 100 — only a plot tells you which.

Most modelling mistakes are visible in a five-minute EDA pass: a long tail that breaks a mean, a clump of impossible values, a column that is half missing. Looking first is cheaper than debugging a model that silently trained on dirty data.

matplotlib gives you explicit control through two objects: a figure (the canvas) and an axes (the plot inside it). You draw onto the axes, label it, and save it:

import matplotlib
matplotlib.use("Agg")          # render to a file; no screen needed
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.hist(values, bins=20)
ax.set_xlabel("score")
ax.set_ylabel("count")
ax.set_title("Score distribution")
fig.savefig("dist.png", dpi=120)

Always label the axes and give the plot a title — an unlabelled chart isn't evidence anyone can act on.
leads into “Choosing charts and headless plotting”.*

University approvals: 0
Related cards
Builds on groupby, merge, and counting carefully · Python for Data Science
Next Choosing charts and headless plotting · Python for Data Science
Tasks
Question 1

What is exploratory data analysis (EDA)?

Question 2

Before a matplotlib figure is useful as evidence, you should always...

Card Info
  • Topic: Python for Data Science
  • Difficulty: Intermediate
  • Completed: 0 users
Creator
Best
Best
BestBuddy