Data class#
There is a special tool in Python called dataclasses. Dataclasses allow you to build classes that store data and provide some built-in tools for operating with them. Refer to the official documentation.
from typing import Optional
from dataclasses import dataclass, field, asdict
Defining#
When defining a dataclass, you can specify values for attributes; however, you can skip them during instance initialization. But the dataclass
does not allow you to define an attribute of a mutable datatype directly. Instead, you have to use dataclass.field(default_factory=<type>)
to define a default value for a mutable datatype.
The following cell shows that you can easily add a default value for an int
field.
@dataclass
class SomeData:
value: int = 10
SomeData()
SomeData(value=10)
But in the case of a mutable datatype, such as list
, you’ll get the corresponding error.
@dataclass
class SomeData:
items: list[str] = []
SomeData()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[30], line 2
1 @dataclass
----> 2 class SomeData:
3 items: list[str] = []
5 SomeData()
File /usr/lib/python3.10/dataclasses.py:1184, in dataclass(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots)
1181 return wrap
1183 # We're called as @dataclass without parens.
-> 1184 return wrap(cls)
File /usr/lib/python3.10/dataclasses.py:1175, in dataclass.<locals>.wrap(cls)
1174 def wrap(cls):
-> 1175 return _process_class(cls, init, repr, eq, order, unsafe_hash,
1176 frozen, match_args, kw_only, slots)
File /usr/lib/python3.10/dataclasses.py:955, in _process_class(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots)
952 kw_only = True
953 else:
954 # Otherwise it's a field of some type.
--> 955 cls_fields.append(_get_field(cls, name, type, kw_only))
957 for f in cls_fields:
958 fields[f.name] = f
File /usr/lib/python3.10/dataclasses.py:812, in _get_field(cls, a_name, a_type, default_kw_only)
810 # For real fields, disallow mutable defaults for known types.
811 if f._field_type is _FIELD and isinstance(f.default, (list, dict, set)):
--> 812 raise ValueError(f'mutable default {type(f.default)} for field '
813 f'{f.name} is not allowed: use default_factory')
815 return f
ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory
You can achieve your goal by using field(default_factory=list)
:
@dataclass
class SomeData:
items: list[str] = field(default_factory=list)
SomeData()
SomeData(items=[])
Initialization logic#
As dataclasses
define their own __init__
methods, you need a way to implement initialization logic specific to your case. For this purpose, you should use the __post_init__
dunder method, which is exclusive to dataclasses.
Suppose you need a dataclass with two variables, minimum
and maximum
, and you want to guarantee that minimum
will definitely be less than maximum
. The following cell shows a dataclass that will print a message in case minimum > maximum
.
@dataclass
class SomeData:
minimum: float
maximum: float
def __post_init__(self):
if self.minimum > self.maximum:
print("Miminum can't be bigger than maximum.")
SomeData(5, 4)
Miminum can't be bigger than maximum.
SomeData(minimum=5, maximum=4)
To dict#
Converting a dataclass to a dictionary is a common requirement. You can achieve this with the asdict
function.
The following examle shows basic usage of the asdict
function.
@dataclass
class SomeData:
a: int
b: float
c: bool
asdict(SomeData(10, 20, 30))
{'a': 10, 'b': 20, 'c': 30}
It’s important to note that inner dataclasses will also be transformed into dictionaries—resulting in a dictionary within a dictionary, not a dataclass within a dictionary.
The following cell sets one dataclass as an attribute of another. However, when applying asdict
to the instance, both dataclasses are converted into dictionaries in the result.
@dataclass
class InnerData:
a: int
b: float
@dataclass
class OutherData:
id: InnerData
a: int
b: bool
data = OutherData(InnerData(10, 20), 30, True)
asdict(data)
{'id': {'a': 10, 'b': 20}, 'a': 30, 'b': True}