Selecting data with loc, iloc, and boolean indexing

Beginner Data Science Praktikum

Created by Pavel · 03.04.2026 at 11:49 UTC

Three selection tools live side by side in pandas, and confusing them is one of the most common early mistakes.

loc is label-based: you name the rows and columns you want by their index labels and column names. df.loc[2:5, 'name':'city'] includes both endpoints — row 5 and column city are part of the result. This is unlike Python slicing, which excludes the endpoint, and the difference trips people up regularly.

iloc is position-based: df.iloc[0:5, 0:2] means "first five rows, first two columns by integer offset" and follows normal Python slicing rules (endpoint excluded). It works reliably even when the index is messy after a merge or filter — positions are always 0, 1, 2, …

Boolean indexing is the daily workhorse for filtering. A condition like df['age'] > 30 returns a True/False Series that you pass back into df[...]. Combine conditions with & (AND) and | (OR), wrapping each in parentheses — df[(df['age'] > 30) & (df['city'] == 'NYC')]. The parentheses are mandatory because Python's & binds tighter than > and ==; leaving them off produces a confusing TypeError. Using Python's and/or instead of &/| is the other classic trap — those keywords cannot operate on a Series element-wise.

df['col'].isin(['A', 'B']) is a cleaner way to test membership than chaining | conditions, and ~ negates a mask.

Full indexing guide: [1].

Sources

[1]https://pandas.pydata.org/docs/user_guide/indexing.html Return to text

University approvals: 0

Tasks

Question 1

What does this code print?

import pandas as pd
df = pd.DataFrame({'name': ['Alice','Bob','Charlie','Diana'],
                   'score': [85, 42, 73, 91]},
                  index=[10, 20, 30, 40])
print(df.iloc[1:3]['name'].tolist())

Hint

iloc uses integer positions (0-based, endpoint excluded), not the custom index labels.

['Bob', 'Charlie']

['Alice', 'Bob', 'Charlie']

['Bob', 'Charlie', 'Diana']

KeyError

Question 2

A student writes df[df['age'] > 30 & df['city'] == 'NYC'] without parentheses. What happens?

Hint

Operator precedence: & evaluates before comparison operators.

It works correctly, filtering both conditions

Python raises a TypeError because & binds tighter than > and ==

It returns all rows where city is 'NYC', ignoring the age filter

It returns an empty DataFrame

Question 3

Using pandas, implement names_above(csv_text: str, threshold: int) -> list that reads a CSV with columns name and score, uses boolean indexing with loc to select rows where score >= threshold, and returns the matching names as a list.

Example: name,score\nAlice,85\nBob,42\nCharlie,73 with threshold 70 → ['Alice', 'Charlie'].

Submit the function; tests use expression mode (pandas is available).

Hint

df.loc[boolean_mask, 'column_name'] selects rows matching the mask and returns just that column.

import io
import pandas as pd


def names_above(csv_text: str, threshold: int) -> list:
    # TODO: read CSV into DataFrame, use loc with boolean mask, return list of names.
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Data Science Praktikum
Difficulty: Beginner
Completed: 0 users

Creator

Pavel