Data types#

Here we look at things related to data types in Python.

Basic datatypes#

Here are the basic datatypes that are implemented in Python by default. This is just a brief overview of the basic datatypes - see the specific page for more information.

Data Type

Mutable

Collection

Ordered

Description

int

No

No

-

Integer values (e.g., 1, -10)

float

No

No

-

Floating-point numbers (e.g., 3.14)

str

No

Yes

Yes

Strings (e.g., “hello”)

bool

No

No

-

Boolean values (True, False)

list

Yes

Yes

Yes

Lists (e.g., [1, 2, 3])

tuple

No

Yes

Yes

Tuples (e.g., (1, 2, 3))

dict

Yes

Yes

Yes (>=3.7)

Dictionaries (e.g., {“key”: “value”})

set

Yes

Yes

No

Sets (e.g., {1, 2, 3})

frozenset

No

Yes

No

Immutable sets (e.g., frozenset([1, 2, 3]))

NoneType

No

No

-

Represents the absence of a value

In the table above we mentioned many Python datatypes, now let us describe the properties that define different types more precisely.

Mutable#

The main feature of the mutable datatypes is that they can change it content.

The following example shows how to add another element to the Python list. The same list now has different contents - that’s why it’s mutable.

orginal_list = [1,2,3]
orginal_list.append(4)
orginal_list 
[1, 2, 3, 4]

However, it’s important to note that when you assign a new value to an immutable variable, say integer, you’re not changing the value stored in the original integer. Instead, you create a new integer object and assign it to the variable name. This means that integers are immutable in Python; once an integer object is created, its value cannot be changed.

The following example shows that each time you change the value of intger (or any other mutable type), it’s a new object.

a = 5
print(id(5))
a = 7
print(id(a))
129449186738544
129449186738608

Collection#

In Python, a “collection” refers to a group of multiple elements that are stored together and can be manipulated as a unit. Collections are fundamental data structures that allow you to manage and organize data efficiently.

Each collection has its own complexity for different operations. Check out “Time complexity” for more information.

Ordered#

Some collections in python allow user to define order other not.

Ordered collections allow you to define an order. The following example shows that the list retains the same order that was specified when it was created. This property of the list means that it’s ordered.

print(['a', 'b', 'c', 'd', 'e'])
print(['e', 'd', 'c', 'b', 'a'])
['a', 'b', 'c', 'd', 'e']
['e', 'd', 'c', 'b', 'a']

The same example but with an unordered set. It completely ignores the order in which the elements were created and records them according to specific rules.

print({'a', 'b', 'c', 'd', 'e'})
print({'e', 'd', 'c', 'b', 'a'})
{'c', 'd', 'a', 'b', 'e'}
{'c', 'd', 'a', 'b', 'e'}

Datetime#

Datetime data types are commonly used but can sometimes be tricky. This section covers datetime data types in core Python. Check corresponding page on the official python documentaion or the special page on this site.

The following classes are available for working with datetime data:

Class

Description

date

Represents a date (year, month, and day) without time information.

time

Represents a time (hour, minute, second, microsecond) without any date.

datetime

Combines both date and time information (year, month, day, hour, minute, second, microsecond).

timedelta

Represents a duration, i.e., the difference between two dates or times.

tzinfo

A base class for dealing with time zones, used to handle timezone conversions.

timezone

A subclass of tzinfo that provides fixed offset time zones and UTC time zone.


The following cells show, according to my experience, the most commonly used features among these:

  • Calculating periods between dates (time deltas).

  • Formatting dates to specific formats.

The following cells have created some datetime objects:

from datetime import date
begin = date(2022, 10, 20)
end = date(2022, 3, 4)

begin, end
(datetime.date(2022, 10, 20), datetime.date(2022, 3, 4))

We can easily compute the period between them just by using the - operator.

begin - end
datetime.timedelta(days=230)

The following cell applies the strftime method to format the transformation of the datetime to a string.

begin.strftime('%a %d %b %Y, %I:%M%p')
'Thu 20 Oct 2022, 12:00AM'

Subtypes#

Python supports the concept of subtyping. Formally, we say that a type T is a subtype of U if the following two conditions hold:

  • Every value of type T is also a valid value of type U.

  • Every operation (method or function) that can be performed on type U can also be performed on type T, with T maintaining all the guarantees of U.

Consider example where T=bool and U=int. Anything you can do with int is acceptable to do with bool.

print(sum([True, False, True, False]))
print(True**False)
2
1

You can check if T is a subtype of U using the function issubclass(<T>, <U>) - returns true if T is a subtype of U.

print("Is bool subclass of int -", issubclass(bool, int))
print("Is list subclass of int -", issubclass(list, int))
Is bool subclass of int - True
Is list subclass of int - False

Here is code that generates table where elements \(r_{ij}\) mark whether the type of the \(i\)-th row is a subtype of the type of the \(j\)-th column.

import numbers
import collections

my_types = [
    bool, int, float, complex, numbers.Number, 
    list, bytearray, tuple, bytes, set, frozenset, dict,
    collections.abc.MutableSequence, 
    collections.abc.Sequence, collections.abc.Set, 
    collections.abc.Mapping
]

cell_wrapper = (
    lambda content, color: 
    f"<td style='background:{color};text-align:center'>{content}</td>"
)

issubclasses = [
    [
        (
            cell_wrapper('✓', "green") 
            if issubclass(t1, t2)  
            else cell_wrapper('x', "red ")
        )
        if t1 != t2 else cell_wrapper('-', "gray")
        for t2 in my_types
    ]
    for t1 in my_types 
]

def type_ecraniser(s):
    replacements = {
        '<','>','class', "'"
    }
    for sumb in replacements:
        s = s.replace(sumb, '')
    return s.strip()

header = "".join([
    "<th>" + type_ecraniser(str(t)) + "</th>" 
    for t in [""] + my_types
])
header = "<tr>" + header + "</tr>"

content = "".join([
    (
        "<tr>" +
        f"<td>{type_ecraniser(str(my_types[i]))}</td>" + 
        "".join(row) +
        "</tr>"
    )
    for i, row in enumerate(issubclasses)
])

HTML(f"<table>{header + content}</table>")
boolintfloatcomplexnumbers.Numberlistbytearraytuplebytessetfrozensetdictcollections.abc.MutableSequencecollections.abc.Sequencecollections.abc.Setcollections.abc.Mapping
bool-xxxxxxxxxxxxx
intx-xxxxxxxxxxxxx
floatxx-xxxxxxxxxxxx
complexxxx-xxxxxxxxxxx
numbers.Numberxxxx-xxxxxxxxxxx
listxxxxx-xxxxxxxx
bytearrayxxxxxx-xxxxxxx
tuplexxxxxxx-xxxxxxx
bytesxxxxxxxx-xxxxxx
setxxxxxxxxx-xxxxx
frozensetxxxxxxxxxx-xxxx
dictxxxxxxxxxxx-xxx
collections.abc.MutableSequencexxxxxxxxxxxx-xx
collections.abc.Sequencexxxxxxxxxxxxx-xx
collections.abc.Setxxxxxxxxxxxxxx-x
collections.abc.Mappingxxxxxxxxxxxxxxx-