Reshaping: pivot tables and melt

Intermediate Data Science Praktikum

Created by Pavel · 03.04.2026 at 11:49 UTC

Data comes in two shapes and you will constantly convert between them.

Wide format puts each measured variable in its own column: jan, feb, mar side by side. Spreadsheets love this; humans read it easily. But try filtering "only March data" or plotting a monthly trend — suddenly you are writing column-name logic instead of value-based filters.

Long (tidy) format stacks those months into two columns: month (the variable name) and sales (the value). Now every observation is its own row, filtering is df[df['month']=='mar'], and plotting libraries map month to the x-axis without gymnastics.

pd.melt() converts wide → long: pd.melt(df, id_vars=['product'], value_vars=['jan','feb','mar'], var_name='month', value_name='sales') unpivots month columns into rows. It is the operation you need when someone hands you a spreadsheet with one column per time period.

pd.pivot_table() goes the other way: pd.pivot_table(df, values='sales', index='region', columns='product', aggfunc='sum', fill_value=0) builds a cross-tabulation — one row per region, one column per product, summed sales at each intersection. Unlike bare pivot, pivot_table handles duplicate (index, columns) pairs by aggregating.

Rule of thumb: if your analysis requires filtering on what is currently a column name, melt first. If your report needs a spreadsheet-style summary, pivot.

Reshaping guide: [1].

Sources

[1]https://pandas.pydata.org/docs/user_guide/reshaping.html Return to text

University approvals: 0

Tasks

Question 1

What shape does pd.melt(df, id_vars=['product'], value_vars=['jan','feb']) produce if df has 3 product rows?

Hint

Each original row is unpivoted into one row per value_var.

3 rows × 3 columns

6 rows × 3 columns (product, variable, value)

2 rows × 4 columns

3 rows × 4 columns

Question 2

What does this code print?

import pandas as pd
df = pd.DataFrame({'region': ['East','East','West','West'],
                   'product': ['A','B','A','B'],
                   'sales': [100, 200, 150, 50]})
pt = pd.pivot_table(df, values='sales', index='region',
                    columns='product', aggfunc='sum')
print(pt.loc['East', 'B'])

Hint

There is exactly one row with region=East and product=B.

200

250

100

Error — duplicate index entries

Question 3

Using pandas, implement melt_months(csv_text: str) -> int that reads a CSV where the first column is product and all other columns are month names. Melt the month columns into long format with var_name='month' and value_name='sales', then return the number of rows in the melted result.

Example: product,jan,feb,mar\nA,10,20,30\nB,40,50,60 → 6.

Submit the function; tests use expression mode.

Hint

Number of melted rows = number of original rows × number of value columns.

import io
import pandas as pd


def melt_months(csv_text: str) -> int:
    # TODO: read CSV, melt all non-product columns, return row count.
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Data Science Praktikum
Difficulty: Intermediate
Completed: 0 users

Creator

Pavel