datetime#

In this section I will pay special attention to working with dates and times in Pandas.

import numpy as np
import pandas as pd

test_series = pd.Series(
    np.random.choice(
        pd.date_range("2021-01-01", "2022-12-31"), 
        size=40
))

date_range#

So, in the following example, I just put a few dates in the list day by day.

Basic#

Allows you to create an array of dates. By default it is DatetimeIndex, but you can easily convert it to any other common array type.

pd.date_range("2020-01-01", "2020-01-07").to_list()
[Timestamp('2020-01-01 00:00:00'),
 Timestamp('2020-01-02 00:00:00'),
 Timestamp('2020-01-03 00:00:00'),
 Timestamp('2020-01-04 00:00:00'),
 Timestamp('2020-01-05 00:00:00'),
 Timestamp('2020-01-06 00:00:00'),
 Timestamp('2020-01-07 00:00:00')]

freq#

This parameter allows you to set the step at which observations are added to the array. So in the following dataframe I have shown some options with arguments that should be passed as values of this parameter.

pd.DataFrame({
    "Days 'd'" : pd.date_range("2021-01-01", "2021-01-10", freq="d")
    "Weeks 'W'" : pd.date_range("2021-01-01", "2021-3-7", freq="W"),
})
Weeks 'W' Days 'd'
0 2021-01-03 2021-01-01
1 2021-01-10 2021-01-02
2 2021-01-17 2021-01-03
3 2021-01-24 2021-01-04
4 2021-01-31 2021-01-05
5 2021-02-07 2021-01-06
6 2021-02-14 2021-01-07
7 2021-02-21 2021-01-08
8 2021-02-28 2021-01-09
9 2021-03-07 2021-01-10

Ramdom dates#

This is the snippet of code that allows you to generate a pandas.Series of random dates within specified borders. It will be common in the other examples. So I create test_series here, which will be an experimental variable for other sections by default.

import pandas as pd
import numpy as np

start_date = '2021-01-01'
end_date = '2021-12-31'
num_dates = 10

test_series = pd.Series(
    np.random.choice(
        pd.date_range(start_date, end_date), 
        size=num_dates
))
test_series
0   2021-03-24
1   2021-10-19
2   2021-08-04
3   2021-02-08
4   2021-04-10
5   2021-08-31
6   2021-06-22
7   2021-12-19
8   2021-09-20
9   2021-08-08
dtype: datetime64[ns]

Extracting components#

It’s a common task to get a fraction of the date from pandas series, so here I show some options. Usually you should use the dt property of the series to get access to it.

dt.day_of_week#

You can get days of the week.

By default it returns numbers representing the days of the week: 0-Monday,…,6-Sunday.

So in the following example I show the case for the week this page was created.

week_range = pd.date_range("2023-08-28", "2023-09-03").to_series()

pd.DataFrame({
    "Original date" : week_range,
    "Day of the week" : week_range.dt.day_of_week
})
Original date Day of the week
2023-08-28 2023-08-28 0
2023-08-29 2023-08-29 1
2023-08-30 2023-08-30 2
2023-08-31 2023-08-31 3
2023-09-01 2023-09-01 4
2023-09-02 2023-09-02 5
2023-09-03 2023-09-03 6

week of year#

dt.isocalendar().week#

You can use the above function to find the week number for any date.

test_weeks = pd.date_range("2021-01-01", "2021-04-1", freq="W").to_series()
test_weeks.dt.isocalendar().week.rename("week number").to_frame()
week number
2021-01-03 53
2021-01-10 1
2021-01-17 2
2021-01-24 3
2021-01-31 4
2021-02-07 5
2021-02-14 6
2021-02-21 7
2021-02-28 8
2021-03-07 9
2021-03-14 10
2021-03-21 11
2021-03-28 12

Note The first days of a certain year may refer to the 54th week of the previous year. Documentation about this feature not really reach. The documentation about this function is not very extensive and does not mention in detail the exact algorithm for calculating the value in question. But in the next cell, I went through the dates of the border months of different summers. It turns out that the week refers to the year in which lies more number of its days and is numbered accordingly.

from IPython.display import HTML
for y in range(2012, 2017):

    next_y = y+1
    
    days = pd.date_range(
        datetime(y, 12, 28), 
        datetime(next_y, 1, 3), freq="d"
    ).to_series()

    display(HTML(f"<p style='font-size:150%'>======{y}-{next_y}======</p>"))
    display(pd.DataFrame({
        "Day":days,
        "Day of week":days.dt.day_of_week,
        "Week of year":days.dt.isocalendar().week
    }))

======2012-2013======

Day Day of week Week of year
2012-12-28 2012-12-28 4 52
2012-12-29 2012-12-29 5 52
2012-12-30 2012-12-30 6 52
2012-12-31 2012-12-31 0 1
2013-01-01 2013-01-01 1 1
2013-01-02 2013-01-02 2 1
2013-01-03 2013-01-03 3 1

======2013-2014======

Day Day of week Week of year
2013-12-28 2013-12-28 5 52
2013-12-29 2013-12-29 6 52
2013-12-30 2013-12-30 0 1
2013-12-31 2013-12-31 1 1
2014-01-01 2014-01-01 2 1
2014-01-02 2014-01-02 3 1
2014-01-03 2014-01-03 4 1

======2014-2015======

Day Day of week Week of year
2014-12-28 2014-12-28 6 52
2014-12-29 2014-12-29 0 1
2014-12-30 2014-12-30 1 1
2014-12-31 2014-12-31 2 1
2015-01-01 2015-01-01 3 1
2015-01-02 2015-01-02 4 1
2015-01-03 2015-01-03 5 1

======2015-2016======

Day Day of week Week of year
2015-12-28 2015-12-28 0 53
2015-12-29 2015-12-29 1 53
2015-12-30 2015-12-30 2 53
2015-12-31 2015-12-31 3 53
2016-01-01 2016-01-01 4 53
2016-01-02 2016-01-02 5 53
2016-01-03 2016-01-03 6 53

======2016-2017======

Day Day of week Week of year
2016-12-28 2016-12-28 2 52
2016-12-29 2016-12-29 3 52
2016-12-30 2016-12-30 4 52
2016-12-31 2016-12-31 5 52
2017-01-01 2017-01-01 6 52
2017-01-02 2017-01-02 0 1
2017-01-03 2017-01-03 1 1

weekofyear#

Pandas datetime unit timestamp has a weekofyear parameter that you can combine with the apply method as in the next example.

test_weeks = pd.date_range("2021-01-01", "2021-04-1", freq="W").to_series()
test_weeks.apply(lambda val: val.weekofyear).rename("week number").to_frame()
week number
2021-01-03 53
2021-01-10 1
2021-01-17 2
2021-01-24 3
2021-01-31 4
2021-02-07 5
2021-02-14 6
2021-02-21 7
2021-02-28 8
2021-03-07 9
2021-03-14 10
2021-03-21 11
2021-03-28 12