DataFrames: reading, filtering, adding columns

pandas unifies everything so far into one object: the DataFrame, a table with named columns and a row index. If you know SQL, much of it maps directly. You usually start by reading a file:

import pandas as pd
df = pd.read_csv("events.csv")
df.head()        # peek at the first rows
df.dtypes        # what type did each column get?

A single column is a Series; the whole table is a DataFrame. Think of a DataFrame as a dict of equally-long Series sharing one index.

You select rows with a boolean condition — the mask idea again, now with labels — and you derive new columns without mutating the original by using assign:

ok = df[df["status"] == "ok"]            # filter   (WHERE status = 'ok')
df2 = df.assign(score2=df["score"] * 2)  # add a column, original untouched

df["status"] == "ok" builds a boolean Series, and indexing the DataFrame with it keeps only the rows where it is True. This is the pandas version of the loop-and-filter you wrote in the very first topic.
np.where” and leads into “groupby, merge, and counting carefully”.*

DataFrames: reading, filtering, adding columns

Related cards

Tasks

Question 1

Question 2

Question 3

Card Info

Creator