Merging and concatenating DataFrames

Beginner Data Science Praktikum

Created by Pavel · 03.04.2026 at 11:49 UTC · 1 completed

Data rarely lives in one table. Customers are in one CSV, their orders in another, product details in a third. Bringing them together is the job of pd.merge(), pandas' SQL-style join.

pd.merge(df1, df2, on='id', how='inner') keeps only rows whose id appears in both DataFrames — exactly like SQL INNER JOIN. Change how to 'left' to keep every row from df1 even when df2 has no match (unmatched df2 columns become NaN), 'right' for the mirror, or 'outer' to keep everything from both sides. When the key columns have different names, use left_on and right_on instead of on.

If the same column name (other than the key) exists in both frames, pandas appends _x and _y suffixes — fine for quick exploration, but use suffixes=('_left','_right') to make the output self-documenting.

pd.concat() solves a different problem: stacking DataFrames that share the same schema. pd.concat([jan, feb, mar], axis=0) appends rows vertically, like reading twelve monthly files into one big table. axis=1 places frames side by side (adding columns), which requires the indexes to align.

A common source of confusion: merge joins on column values; df1.join(df2) joins on the index. Most of the time you want merge.

Merging guide: [1].

Sources

[1]https://pandas.pydata.org/docs/user_guide/merging.html Return to text

University approvals: 0

Tasks

Question 1

What does this code print?

import pandas as pd
df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['A', 'B', 'C']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'bonus': [500, 300, 700]})
result = pd.merge(df1, df2, on='id', how='left')
print(result['bonus'].tolist())

Hint

Left join keeps ALL rows from df1. id=1 has no match in df2.

[nan, 500, 300]

[500, 300, 700]

[500, 300]

Error — id=1 has no match

Question 2

When should you use pd.concat([df1, df2], axis=0) instead of pd.merge(df1, df2, ...)?

Hint

Think about reading 12 monthly CSVs and combining them.

When you want to match rows by a shared key column

When both DataFrames have the same columns and you want to stack their rows

When you need a cross join

When you want to add columns from df2 to df1 by matching a key

Question 3

Using pandas, implement merge_bonus_total(emp_csv: str, bonus_csv: str) -> float that left-merges employees with bonuses on id, fills missing bonuses with 0, and returns the total bonus.

Example: id,name\n1,Alice\n2,Bob\n3,Charlie merged with id,bonus\n1,500\n3,300 → 800.0.

Submit the function; tests use expression mode.

Hint

Left merge keeps all employees. fillna(0) turns missing bonuses into zero before summing.

import io
import pandas as pd


def merge_bonus_total(emp_csv: str, bonus_csv: str) -> float:
    # TODO: left-merge on 'id', fill NaN bonuses with 0, return sum.
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Data Science Praktikum
Difficulty: Beginner
Completed: 1 users

Creator

Pavel