Reshaping: pivot tables and melt
Data comes in two shapes and you will constantly convert between them.
Wide format puts each measured variable in its own column: jan, feb, mar side by side. Spreadsheets love this; humans read it easily. But try filtering "only March data" or plotting a monthly trend — suddenly you are writing column-name logic instead of value-based filters.
Long (tidy) format stacks those months into two columns: month (the variable name) and sales (the value). Now every observation is its own row, filtering is df[df['month']=='mar'], and plotting libraries map month to the x-axis without gymnastics.
pd.melt() converts wide → long: pd.melt(df, id_vars=['product'], value_vars=['jan','feb','mar'], var_name='month', value_name='sales') unpivots month columns into rows. It is the operation you need when someone hands you a spreadsheet with one column per time period.
pd.pivot_table() goes the other way: pd.pivot_table(df, values='sales', index='region', columns='product', aggfunc='sum', fill_value=0) builds a cross-tabulation — one row per region, one column per product, summed sales at each intersection. Unlike bare pivot, pivot_table handles duplicate (index, columns) pairs by aggregating.
Rule of thumb: if your analysis requires filtering on what is currently a column name, melt first. If your report needs a spreadsheet-style summary, pivot.
Reshaping guide: [1].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Intermediate
- Completed: 0 users