Merging and concatenating DataFrames
Data rarely lives in one table. Customers are in one CSV, their orders in another, product details in a third. Bringing them together is the job of pd.merge(), pandas' SQL-style join.
pd.merge(df1, df2, on='id', how='inner') keeps only rows whose id appears in both DataFrames — exactly like SQL INNER JOIN. Change how to 'left' to keep every row from df1 even when df2 has no match (unmatched df2 columns become NaN), 'right' for the mirror, or 'outer' to keep everything from both sides. When the key columns have different names, use left_on and right_on instead of on.
If the same column name (other than the key) exists in both frames, pandas appends _x and _y suffixes — fine for quick exploration, but use suffixes=('_left','_right') to make the output self-documenting.
pd.concat() solves a different problem: stacking DataFrames that share the same schema. pd.concat([jan, feb, mar], axis=0) appends rows vertically, like reading twelve monthly files into one big table. axis=1 places frames side by side (adding columns), which requires the indexes to align.
A common source of confusion: merge joins on column values; df1.join(df2) joins on the index. Most of the time you want merge.
Merging guide: [1].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Beginner
- Completed: 1 users