Creating and modifying DataFrame columns
The simplest thing you do with a DataFrame is derive new columns from existing ones, and the simplest way to do it is vectorized arithmetic: df['sales'] = df['price'] * df['quantity']. No loop, no apply — pandas multiplies two NumPy arrays element-wise in compiled C, and the result becomes a new column in the same DataFrame.
When the logic is conditional — pass/fail thresholds, discount tiers, flag columns — np.where(condition, value_if_true, value_if_false) is the vectorized if/else. It evaluates the entire mask at once and fills the result array in one pass. For more complex branching, np.select takes a list of conditions and a list of corresponding values, acting like a vectorized chain of elif.
df.assign() returns a new DataFrame with the extra column instead of mutating the original. This matters in method-chaining pipelines where you want df.query(...).assign(total=...).groupby(...) without side effects leaking back into the source frame.
Removing a column is df.drop('col', axis=1) or del df['col']. The axis=1 in drop means "columns" (axis=0 means rows) — an axis convention that confuses everyone at first but becomes second nature.
DataFrame column operations: [1], np.where reference: [2].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Beginner
- Completed: 1 users