Dates and times in pandas
Dates arrive as strings, get stored as objects, and break when you try to subtract them. The entry point is always pd.to_datetime(), which parses strings like '2025-03-15' or '15/03/2025' into proper datetime64 values. Once converted, the .dt accessor opens up a clean API for extraction: df['date'].dt.year, .dt.month, .dt.dayofweek (0 = Monday, 6 = Sunday), .dt.quarter. Without to_datetime, these attributes do not exist and you are stuck splitting strings by hand.
For time series analysis, resample() is the groupby equivalent for time. After setting the date column as the index, df.resample('ME').sum() groups all rows within each calendar month and sums them. Common frequency strings: 'D' (daily), 'W' (weekly), 'ME' (month end), 'QE' (quarter end). If you need a regular date axis for testing, pd.date_range('2025-01-01', periods=12, freq='ME') generates twelve month-end timestamps.
Date arithmetic also becomes natural: (pd.Timestamp.now() - df['date']).dt.days gives you the age of each record in days. Subtracting two datetime columns produces a Timedelta Series — you can access .dt.total_seconds() or .dt.days without manual epoch math.
A common gotcha: if your CSV has dates like 01/02/2025, is that January 2nd or February 1st? Pass dayfirst=True to pd.to_datetime for European dd/mm/yyyy formats, or specify format='%d/%m/%Y' to be explicit.
Time series guide: [1], date_range docs: [2].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Beginner
- Completed: 1 users