Creating and modifying DataFrame columns

Beginner Data Science Praktikum

Created by Pavel · 03.04.2026 at 11:49 UTC · 1 completed

The simplest thing you do with a DataFrame is derive new columns from existing ones, and the simplest way to do it is vectorized arithmetic: df['sales'] = df['price'] * df['quantity']. No loop, no apply — pandas multiplies two NumPy arrays element-wise in compiled C, and the result becomes a new column in the same DataFrame.

When the logic is conditional — pass/fail thresholds, discount tiers, flag columns — np.where(condition, value_if_true, value_if_false) is the vectorized if/else. It evaluates the entire mask at once and fills the result array in one pass. For more complex branching, np.select takes a list of conditions and a list of corresponding values, acting like a vectorized chain of elif.

df.assign() returns a new DataFrame with the extra column instead of mutating the original. This matters in method-chaining pipelines where you want df.query(...).assign(total=...).groupby(...) without side effects leaking back into the source frame.

Removing a column is df.drop('col', axis=1) or del df['col']. The axis=1 in drop means "columns" (axis=0 means rows) — an axis convention that confuses everyone at first but becomes second nature.

DataFrame column operations: [1], np.where reference: [2].

Sources

University approvals: 0

Tasks

Question 1

What does this code print?

import pandas as pd
import numpy as np
df = pd.DataFrame({'score': [85, 42, 60, 73]})
df['grade'] = np.where(df['score'] >= 60, 'Pass', 'Fail')
print(df['grade'].tolist())

Hint

85 ≥ 60 → Pass, 42 < 60 → Fail, 60 ≥ 60 → Pass, 73 ≥ 60 → Pass.

['Pass', 'Fail', 'Pass', 'Pass']

['Pass', 'Fail', 'Fail', 'Pass']

['Pass', 'Fail', 'Pass', 'Fail']

Error — np.where cannot return strings

Question 2

What is the key difference between df['total'] = df['a'] + df['b'] and df2 = df.assign(total=df['a'] + df['b'])?

Hint

Try printing df after each — does it have the 'total' column?

assign() is slower because it copies the entire DataFrame

The first modifies df in place; assign() returns a new DataFrame leaving df unchanged

assign() only works with numeric columns

There is no difference — both modify df

Question 3

Using pandas and numpy, implement add_category(csv_text: str, threshold: int) -> list that reads a CSV with a score column, adds a new column category with 'Pass' where score ≥ threshold and 'Fail' otherwise (using np.where), and returns the category column as a list.

Example: name,score\nAlice,85\nBob,42 with threshold 60 → ['Pass', 'Fail'].

Submit the function; tests use expression mode.

Hint

np.where(df['score'] >= threshold, 'Pass', 'Fail') returns an array you can assign to a new column.

import io
import numpy as np
import pandas as pd


def add_category(csv_text: str, threshold: int) -> list:
    # TODO: read CSV, add 'category' column using np.where, return as list.
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Data Science Praktikum
Difficulty: Beginner
Completed: 1 users

Creator

Pavel