CSV files and tabular I/O in Python
Most data still arrives as rows in a text file: someone exports a spreadsheet, a sensor dumps a log, or an open-data portal offers a download. You pull it in, turn strings into numbers, and aggregate—but the first trap is treating each line as "split on commas." A free-text column might hold "Zurich, downtown" inside quotes; a naive split turns one row into three columns and your pipeline quietly drifts.
The csv module and pandas.read_csv exist because real CSV has rules for quoting and delimiters. Once the table is in memory, a second trap appears: European exports often use commas as decimal separators (1,5 vs 1.5), so you learn to normalize before casting. That journey—from messy export to trustworthy dataframe—is what this stack is for.
Encoding issues (mojibake) are a different failure mode than bad splitting; Text encodings (UTF-8, Latin-1, Windows-1252) in this deck covers byte-to-text contracts.
Further reading: [1] (Real Python on CSV and pathlib), read_csv reference [2].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Beginner
- Completed: 1 users