Selecting data with loc, iloc, and boolean indexing
Three selection tools live side by side in pandas, and confusing them is one of the most common early mistakes.
loc is label-based: you name the rows and columns you want by their index labels and column names. df.loc[2:5, 'name':'city'] includes both endpoints — row 5 and column city are part of the result. This is unlike Python slicing, which excludes the endpoint, and the difference trips people up regularly.
iloc is position-based: df.iloc[0:5, 0:2] means "first five rows, first two columns by integer offset" and follows normal Python slicing rules (endpoint excluded). It works reliably even when the index is messy after a merge or filter — positions are always 0, 1, 2, …
Boolean indexing is the daily workhorse for filtering. A condition like df['age'] > 30 returns a True/False Series that you pass back into df[...]. Combine conditions with & (AND) and | (OR), wrapping each in parentheses — df[(df['age'] > 30) & (df['city'] == 'NYC')]. The parentheses are mandatory because Python's & binds tighter than > and ==; leaving them off produces a confusing TypeError. Using Python's and/or instead of &/| is the other classic trap — those keywords cannot operate on a Series element-wise.
df['col'].isin(['A', 'B']) is a cleaner way to test membership than chaining | conditions, and ~ negates a mask.
Full indexing guide: [1].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Beginner
- Completed: 0 users