Choosing charts and headless plotting

Intermediate Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

Four charts do most of the everyday work, and each answers a different question:

  • a histogram shows the distribution of one variable — its shape, spread, and outliers;
  • a scatter shows the relationship between two variables;
  • a line shows a trend over an ordered axis, usually time;
  • a bar compares groups.

Choosing the wrong chart hides the very thing you're looking for: a scatter won't show you skew, and a histogram won't show you a trend. Tie every mark directly to an array or column so the figure says exactly, and only, what the numbers say.

One detail trips people up when plotting outside a notebook: on a server or in automated grading there is no display, so you set a non-interactive backend with matplotlib.use("Agg") before importing pyplot. The Agg backend renders straight to image files (PNG and the like) without needing a graphical window.

Keep in mind matplotlib's limitation: it draws what you give it but can't fix it — no plot rescues dirty data, and a beautiful chart of the wrong column is just a confident mistake. What EDA surfaces here sets the agenda for the validation and feature choices that follow.
and leads into “Reading JSON and the many faces of missing”.*

University approvals: 0
Related cards
Builds on Why look first, and the figure/axes model · Python for Data Science
Next Reading JSON and the many faces of missing · Python for Data Science
Tasks
Question 1

You want to inspect the distribution of a single numeric column. Which plot is the natural first choice?

Question 2

Why set matplotlib.use("Agg") before importing pyplot in a script that runs on a server or in automated grading?

Card Info
  • Topic: Python for Data Science
  • Difficulty: Intermediate
  • Completed: 0 users
Creator
Best
Best
BestBuddy