CSV files and tabular I/O in Python

Beginner Data Science Praktikum
Created by Pavel · 21.03.2026 at 01:05 UTC · 1 completed

Most data still arrives as rows in a text file: someone exports a spreadsheet, a sensor dumps a log, or an open-data portal offers a download. You pull it in, turn strings into numbers, and aggregate—but the first trap is treating each line as "split on commas." A free-text column might hold "Zurich, downtown" inside quotes; a naive split turns one row into three columns and your pipeline quietly drifts.

The csv module and pandas.read_csv exist because real CSV has rules for quoting and delimiters. Once the table is in memory, a second trap appears: European exports often use commas as decimal separators (1,5 vs 1.5), so you learn to normalize before casting. That journey—from messy export to trustworthy dataframe—is what this stack is for.

Encoding issues (mojibake) are a different failure mode than bad splitting; Text encodings (UTF-8, Latin-1, Windows-1252) in this deck covers byte-to-text contracts.

Further reading: [1] (Real Python on CSV and pathlib), read_csv reference [2].


Sources

University approvals: 0
Tasks
Question 1

You receive a CSV where some text fields contain commas inside double quotes. What is the main reason plain line.split(',') is unsafe for parsing?

Hint

Think about how spreadsheets export free-text columns.

Question 2

Why do data science tutorials often pass encoding='utf-8' (or rely on UTF-8 defaults) when reading CSV?

Hint

Think about accented letters in European place names.

Question 3

Implement row_numeric_sum(line: str) -> float that parses one CSV data line with two numeric columns name,value (value may be int or float) and returns the numeric value only. The line has no comma inside the name.

Example: Zurich,12.512.5.

Submit the full function; tests call it in expression mode.

Hint

Split once from the left so the name stays intact.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Question 4

Using pandas, implement csv_row_count(csv_text: str) -> int that returns the number of data rows for UTF-8 CSV text in memory. Ignore blank lines and rows whose first non-space character in the first column is #.

Example: city,pop\nZurich,400\n# comment,0\nBern,130\n\n2.

Use pd.read_csv with io.StringIO. Submit the full function; tests use expression mode on the Studdyco runner (pandas is installed in the default code sandbox image).

Hint

Read first, then filter rows using a boolean mask on the first column.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Data Science Praktikum
  • Difficulty: Beginner
  • Completed: 1 users
Creator
Pavel
Pavel