Selecting data with loc, iloc, and boolean indexing

Beginner Data Science Praktikum
Created by Pavel · 03.04.2026 at 11:49 UTC

Three selection tools live side by side in pandas, and confusing them is one of the most common early mistakes.

loc is label-based: you name the rows and columns you want by their index labels and column names. df.loc[2:5, 'name':'city'] includes both endpoints — row 5 and column city are part of the result. This is unlike Python slicing, which excludes the endpoint, and the difference trips people up regularly.

iloc is position-based: df.iloc[0:5, 0:2] means "first five rows, first two columns by integer offset" and follows normal Python slicing rules (endpoint excluded). It works reliably even when the index is messy after a merge or filter — positions are always 0, 1, 2, …

Boolean indexing is the daily workhorse for filtering. A condition like df['age'] > 30 returns a True/False Series that you pass back into df[...]. Combine conditions with & (AND) and | (OR), wrapping each in parentheses — df[(df['age'] > 30) & (df['city'] == 'NYC')]. The parentheses are mandatory because Python's & binds tighter than > and ==; leaving them off produces a confusing TypeError. Using Python's and/or instead of &/| is the other classic trap — those keywords cannot operate on a Series element-wise.

df['col'].isin(['A', 'B']) is a cleaner way to test membership than chaining | conditions, and ~ negates a mask.

Full indexing guide: [1].


Sources

University approvals: 0
Tasks
Question 1

What does this code print?

import pandas as pd
df = pd.DataFrame({'name': ['Alice','Bob','Charlie','Diana'],
                   'score': [85, 42, 73, 91]},
                  index=[10, 20, 30, 40])
print(df.iloc[1:3]['name'].tolist())
Hint

iloc uses integer positions (0-based, endpoint excluded), not the custom index labels.

Question 2

A student writes df[df['age'] > 30 & df['city'] == 'NYC'] without parentheses. What happens?

Hint

Operator precedence: & evaluates before comparison operators.

Question 3

Using pandas, implement names_above(csv_text: str, threshold: int) -> list that reads a CSV with columns name and score, uses boolean indexing with loc to select rows where score >= threshold, and returns the matching names as a list.

Example: name,score\nAlice,85\nBob,42\nCharlie,73 with threshold 70 → ['Alice', 'Charlie'].

Submit the function; tests use expression mode (pandas is available).

Hint

df.loc[boolean_mask, 'column_name'] selects rows matching the mask and returns just that column.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Data Science Praktikum
  • Difficulty: Beginner
  • Completed: 0 users
Creator
Pavel
Pavel