Data class

Data class#

There is a special tool in Python called dataclasses. Dataclasses allow you to build classes that store data and provide some built-in tools for operating with them. Refer to the official documentation.

from typing import Optional
from dataclasses import dataclass, field, asdict

Defining#

When defining a dataclass, you can specify values for attributes; however, you can skip them during instance initialization. But the dataclass does not allow you to define an attribute of a mutable datatype directly. Instead, you have to use dataclass.field(default_factory=<type>) to define a default value for a mutable datatype.


The following cell shows that you can easily add a default value for an int field.

@dataclass
class SomeData:
    value: int = 10

SomeData()
SomeData(value=10)

But in the case of a mutable datatype, such as list, you’ll get the corresponding error.

@dataclass
class SomeData:
    items: list[str] = []
    
SomeData()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[30], line 2
      1 @dataclass
----> 2 class SomeData:
      3     items: list[str] = []
      5 SomeData()

File /usr/lib/python3.10/dataclasses.py:1184, in dataclass(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots)
   1181     return wrap
   1183 # We're called as @dataclass without parens.
-> 1184 return wrap(cls)

File /usr/lib/python3.10/dataclasses.py:1175, in dataclass.<locals>.wrap(cls)
   1174 def wrap(cls):
-> 1175     return _process_class(cls, init, repr, eq, order, unsafe_hash,
   1176                           frozen, match_args, kw_only, slots)

File /usr/lib/python3.10/dataclasses.py:955, in _process_class(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots)
    952         kw_only = True
    953     else:
    954         # Otherwise it's a field of some type.
--> 955         cls_fields.append(_get_field(cls, name, type, kw_only))
    957 for f in cls_fields:
    958     fields[f.name] = f

File /usr/lib/python3.10/dataclasses.py:812, in _get_field(cls, a_name, a_type, default_kw_only)
    810 # For real fields, disallow mutable defaults for known types.
    811 if f._field_type is _FIELD and isinstance(f.default, (list, dict, set)):
--> 812     raise ValueError(f'mutable default {type(f.default)} for field '
    813                      f'{f.name} is not allowed: use default_factory')
    815 return f

ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory

You can achieve your goal by using field(default_factory=list):

@dataclass
class SomeData:
    items: list[str] = field(default_factory=list)
    
SomeData()
SomeData(items=[])

Initialization logic#

As dataclasses define their own __init__ methods, you need a way to implement initialization logic specific to your case. For this purpose, you should use the __post_init__ dunder method, which is exclusive to dataclasses.


Suppose you need a dataclass with two variables, minimum and maximum, and you want to guarantee that minimum will definitely be less than maximum. The following cell shows a dataclass that will print a message in case minimum > maximum.

@dataclass
class SomeData:
    minimum: float
    maximum: float

    def __post_init__(self):
        if self.minimum > self.maximum:
            print("Miminum can't be bigger than maximum.")
    
SomeData(5, 4)
Miminum can't be bigger than maximum.
SomeData(minimum=5, maximum=4)

To dict#

Converting a dataclass to a dictionary is a common requirement. You can achieve this with the asdict function.


The following examle shows basic usage of the asdict function.

@dataclass
class SomeData:
    a: int
    b: float
    c: bool

asdict(SomeData(10, 20, 30))
{'a': 10, 'b': 20, 'c': 30}

It’s important to note that inner dataclasses will also be transformed into dictionaries—resulting in a dictionary within a dictionary, not a dataclass within a dictionary.


The following cell sets one dataclass as an attribute of another. However, when applying asdict to the instance, both dataclasses are converted into dictionaries in the result.

@dataclass
class InnerData:
    a: int
    b: float

@dataclass
class OutherData:
    id: InnerData
    a: int
    b: bool

data = OutherData(InnerData(10, 20), 30, True)
asdict(data)
{'id': {'a': 10, 'b': 20}, 'a': 30, 'b': True}