Skip to content

Commit d077d28

Browse files
authored
Type hinting bonanza (#1)
1 parent 1515954 commit d077d28

File tree

10 files changed

+185
-89
lines changed

10 files changed

+185
-89
lines changed

README.md

+90-64
Original file line numberDiff line numberDiff line change
@@ -2,70 +2,15 @@
22

33
[![License: MPL 2.0](https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg)](https://opensource.org/licenses/MPL-2.0)
44

5-
This is a prototype of how a library might look like for (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output.
5+
This library enables (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output.
66

7-
It isn't ready for production if you aren't willing to do your own evaluation/quality assurance. I don't recommend using this library with untrusted content. It inherits all of `lxml`'s flaws with regards to XML attacks, and recursively resolves data structures. Because deserialisation is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code. But denial of service attacks would very likely be feasible.
7+
It's currently in alpha. It isn't ready for production if you aren't willing to do your own evaluation/quality assurance. I don't recommend using this library with untrusted content. It inherits all of `lxml`'s flaws with regards to XML attacks, and recursively resolves data structures. Because deserialisation is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code (not a guarantee, see license). Denial of service attacks would very likely be feasible. One workaround may be to [use `lxml` to validate](https://lxml.de/validation.html) untrusted content with a strict schema.
88

99
Requires Python 3.7 or higher.
1010

11-
## Example
12-
13-
(This is a simplified real world example - the container can also include optional `links` child elements.)
14-
15-
```xml
16-
<?xml version="1.0"?>
17-
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
18-
<rootfiles>
19-
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml" />
20-
</rootfiles>
21-
</container>
22-
```
23-
24-
```python
25-
from lxml import etree
26-
from typing import List
27-
from xml_dataclasses import xml_dataclass, rename, load, dump
28-
29-
CONTAINER_NS = "urn:oasis:names:tc:opendocument:xmlns:container"
30-
31-
@xml_dataclass
32-
class RootFile:
33-
__ns__ = CONTAINER_NS
34-
full_path: str = rename(name="full-path")
35-
media_type: str = rename(name="media-type")
36-
37-
38-
@xml_dataclass
39-
class RootFiles:
40-
__ns__ = CONTAINER_NS
41-
rootfile: List[RootFile]
42-
43-
44-
@xml_dataclass
45-
class Container:
46-
__ns__ = CONTAINER_NS
47-
version: str
48-
rootfiles: RootFiles
49-
# WARNING: this is an incomplete implementation of an OPF container
50-
51-
def xml_validate(self):
52-
if self.version != "1.0":
53-
raise ValueError(f"Unknown container version '{self.version}'")
54-
55-
56-
if __name__ == "__main__":
57-
nsmap = {None: CONTAINER_NS}
58-
# see Gotchas, stripping whitespace is highly recommended
59-
parser = etree.XMLParser(remove_blank_text=True)
60-
lxml_el_in = etree.parse("container.xml", parser).getroot()
61-
container = load(Container, lxml_el_in, "container")
62-
lxml_el_out = dump(container, "container", nsmap)
63-
print(etree.tostring(lxml_el_out, encoding="unicode", pretty_print=True))
64-
```
65-
6611
## Features
6712

68-
* XML dataclasses are also dataclasses, and only require a single decorator
13+
* XML dataclasses are also dataclasses, and only require a single decorator to work (but see type hinting section for issues)
6914
* Convert XML documents to well-defined dataclasses, which should work with IDE auto-completion
7015
* Loading and dumping of attributes, child elements, and text content
7116
* Required and optional attributes and child elements
@@ -91,7 +36,7 @@ class Foo:
9136
existing_field: str = rename(field(...), name="existing-field")
9237
```
9338

94-
I would like to add support for validation in future, which might also make it easier to support other types. For now, you can work around this limitation with properties that do the conversion.
39+
For now, you can work around this limitation with properties that do the conversion, and perform post-load validation.
9540

9641
### Defining text
9742

@@ -122,10 +67,10 @@ Children must ultimately be other XML dataclasses. However, they can also be `Op
12267
* Next, `List` should be defined (if multiple child elements are allowed). Valid: `List[Union[XmlDataclass1, XmlDataclass2]]`. Invalid: `Union[List[XmlDataclass1], XmlDataclass2]`
12368
* Finally, if `Optional` or `List` were used, a union type should be the inner-most (again, if needed)
12469

125-
Children can be renamed via the `rename` function, however attempting to set a namespace is invalid, since the namespace is provided by the child type's XML dataclass. Also, unions of XML dataclasses must have the same namespace (you can use different fields if they have different namespaces).
126-
12770
If a class has children, it cannot have text content.
12871

72+
Children can be renamed via the `rename` function. However, attempting to set a namespace is invalid, since the namespace is provided by the child type's XML dataclass. Also, unions of XML dataclasses must have the same namespace (you can use different fields with renaming if they have different namespaces, since the XML names will be resolved as a combination of namespace and name).
73+
12974
### Defining post-load validation
13075

13176
Simply implement an instance method called `xml_validate` with no parameters, and no return value (if you're using type hints):
@@ -137,8 +82,89 @@ def xml_validate(self) -> None:
13782

13883
If defined, the `load` function will call it after all values have been loaded and assigned to the XML dataclass. You can validate the fields you want inside this method. Return values are ignored; instead raise and catch exceptions.
13984

85+
## Example (fully type hinted)
86+
87+
(This is a simplified real world example - the container can also include optional `links` child elements.)
88+
89+
```xml
90+
<?xml version="1.0"?>
91+
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
92+
<rootfiles>
93+
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml" />
94+
</rootfiles>
95+
</container>
96+
```
97+
98+
```python
99+
from dataclasses import dataclass
100+
from typing import List
101+
from lxml import etree # type: ignore
102+
from xml_dataclasses import xml_dataclass, rename, load, dump, NsMap, XmlDataclass
103+
104+
CONTAINER_NS = "urn:oasis:names:tc:opendocument:xmlns:container"
105+
106+
107+
@xml_dataclass
108+
@dataclass
109+
class RootFile:
110+
__ns__ = CONTAINER_NS
111+
full_path: str = rename(name="full-path")
112+
media_type: str = rename(name="media-type")
113+
114+
115+
@xml_dataclass
116+
@dataclass
117+
class RootFiles:
118+
__ns__ = CONTAINER_NS
119+
rootfile: List[RootFile]
120+
121+
122+
# see Gotchas, this workaround is required for type hinting
123+
@xml_dataclass
124+
@dataclass
125+
class Container(XmlDataclass):
126+
__ns__ = CONTAINER_NS
127+
version: str
128+
rootfiles: RootFiles
129+
# WARNING: this is an incomplete implementation of an OPF container
130+
131+
def xml_validate(self) -> None:
132+
if self.version != "1.0":
133+
raise ValueError(f"Unknown container version '{self.version}'")
134+
135+
136+
if __name__ == "__main__":
137+
nsmap: NsMap = {None: CONTAINER_NS}
138+
# see Gotchas, stripping whitespace is highly recommended
139+
parser = etree.XMLParser(remove_blank_text=True)
140+
lxml_el_in = etree.parse("container.xml", parser).getroot()
141+
container = load(Container, lxml_el_in, "container")
142+
lxml_el_out = dump(container, "container", nsmap)
143+
print(etree.tostring(lxml_el_out, encoding="unicode", pretty_print=True))
144+
```
145+
140146
## Gotchas
141147

148+
### Type hinting
149+
150+
This can be a real pain to get right. Unfortunately, if you need this, you may have to resort to:
151+
152+
```python
153+
@xml_dataclass
154+
@dataclass
155+
class Child:
156+
__ns__ = None
157+
pass
158+
159+
@xml_dataclass
160+
@dataclass
161+
class Parent(XmlDataclass):
162+
__ns__ = None
163+
children: Child
164+
```
165+
166+
It's important that `@dataclass` be the *last* decorator, i.e. the closest to the class definition (and so the first to be applied). Luckily, only the root class you intend to pass to `load`/`dump` has to inherit from `XmlDataclass`, but all classes should have the `@dataclass` decorator applied.
167+
142168
### Whitespace
143169

144170
If you are able to, it is strongly recommended you strip whitespace from the input via `lxml`:
@@ -151,7 +177,7 @@ By default, `lxml` preserves whitespace. This can cause a problem when checking
151177

152178
### Optional vs required
153179

154-
On dataclasses, optional fields also usually have a default value to be useful. But this isn't required; `Optional` is just a type hint to say `None` is allowed.
180+
On dataclasses, optional fields also usually have a default value to be useful. But this isn't required; `Optional` is just a type hint to say `None` is allowed. This would occur e.g. if an element has no children.
155181

156182
For XML dataclasses, on loading/deserialisation, whether or not a field is required is determined by if it has a `default`/`default_factory` defined. If so, and it's missing, that default is used. Otherwise, an error is raised.
157183

@@ -163,8 +189,8 @@ This makes sense in many cases, but possibly not every case.
163189

164190
Most of these limitations/assumptions are enforced. They may make this project unsuitable for your use-case.
165191

166-
* It isn't possible to pass any parameters to the wrapped `@dataclass` decorator
167-
* Setting the `init` parameter of a dataclass' `field` will lead to bad things happening, this isn't supported
192+
* If you need to pass any parameters to the wrapped `@dataclass` decorator, apply it before the `@xml_dataclass` decorator
193+
* Setting the `init` parameter of a dataclass' `field` will lead to bad things happening, this isn't supported.
168194
* Deserialisation is strict; missing required attributes and child elements will cause an error. I want this to be the default behaviour, but it should be straightforward to add a parameter to `load` for lenient operation
169195
* Dataclasses must be written by hand, no tools are provided to generate these from, DTDs, XML schema definitions, or RELAX NG schemas
170196

functional/container_test.py

+12-7
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1+
from dataclasses import dataclass
12
from pathlib import Path
23
from typing import List
34

4-
import pytest
5-
from lxml import etree
5+
import pytest # type: ignore
6+
from lxml import etree # type: ignore
67

7-
from xml_dataclasses import dump, load, rename, xml_dataclass
8+
from xml_dataclasses import NsMap, XmlDataclass, dump, load, rename, xml_dataclass
89

910
from .utils import lmxl_dump
1011

@@ -14,32 +15,36 @@
1415

1516

1617
@xml_dataclass
18+
@dataclass
1719
class RootFile:
1820
__ns__ = CONTAINER_NS
1921
full_path: str = rename(name="full-path")
2022
media_type: str = rename(name="media-type")
2123

2224

2325
@xml_dataclass
26+
@dataclass
2427
class RootFiles:
2528
__ns__ = CONTAINER_NS
2629
rootfile: List[RootFile]
2730

2831

2932
@xml_dataclass
30-
class Container:
33+
@dataclass
34+
class Container(XmlDataclass):
3135
__ns__ = CONTAINER_NS
3236
version: str
3337
rootfiles: RootFiles
3438
# WARNING: this is an incomplete implementation of an OPF container
3539

36-
def xml_validate(self):
40+
def xml_validate(self) -> None:
3741
if self.version != "1.0":
3842
raise ValueError(f"Unknown container version '{self.version}'")
3943

4044

4145
@pytest.mark.parametrize("remove_blank_text", [True, False])
42-
def test_functional_container_no_whitespace(remove_blank_text):
46+
def test_functional_container_no_whitespace(remove_blank_text): # type: ignore
47+
nsmap: NsMap = {None: CONTAINER_NS}
4348
parser = etree.XMLParser(remove_blank_text=remove_blank_text)
4449
el = etree.parse(str(BASE / "container.xml"), parser).getroot()
4550
original = lmxl_dump(el)
@@ -55,6 +60,6 @@ def test_functional_container_no_whitespace(remove_blank_text):
5560
],
5661
),
5762
)
58-
el = dump(container, "container", {None: CONTAINER_NS})
63+
el = dump(container, "container", nsmap)
5964
roundtrip = lmxl_dump(el)
6065
assert original == roundtrip

functional/utils.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
from lxml import etree
1+
from typing import Any
22

3+
from lxml import etree # type: ignore
34

4-
def lmxl_dump(el):
5-
encoded = etree.tostring(
5+
6+
def lmxl_dump(el: Any) -> str:
7+
encoded: bytes = etree.tostring(
68
el, encoding="utf-8", pretty_print=True, xml_declaration=True
79
)
810
return encoded.decode("utf-8")

lint

+2
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,6 @@ else
2222
coverage html
2323
exit 1
2424
fi
25+
26+
mypy functional/container_test.py --strict
2527
pytest functional/ --random-order $PYTEST_DEBUG

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "xml_dataclasses"
3-
version = "0.0.4"
3+
version = "0.0.5"
44
description = "(De)serialize XML documents into specially-annotated dataclasses"
55
authors = ["Toby Fleming <[email protected]>"]
66
license = "MPL-2.0"

src/xml_dataclasses/__init__.py

+20-1
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,24 @@
33
logging.getLogger(__name__).addHandler(logging.NullHandler())
44

55
from .modifiers import rename, text # isort:skip
6-
from .resolve_types import xml_dataclass # isort:skip
6+
from .resolve_types import ( # isort:skip
7+
is_xml_dataclass,
8+
xml_dataclass,
9+
NsMap,
10+
XmlDataclass,
11+
)
712
from .serde import dump, load # isort:skip
13+
14+
15+
# __all__ is required for mypy to pick up the imports
16+
# for errors, use `from xml_dataclasses.errors import ...`
17+
__all__ = [
18+
"rename",
19+
"text",
20+
"dump",
21+
"load",
22+
"is_xml_dataclass",
23+
"xml_dataclass",
24+
"NsMap",
25+
"XmlDataclass",
26+
]

src/xml_dataclasses/modifiers.py

+10-4
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,15 @@ def make_field(default: Union[_T, _MISSING_TYPE]) -> Field[_T]:
1414
return field(default=default)
1515

1616

17+
# NOTE: Actual return type is 'Field[_T]', but we want to help type checkers
18+
# to understand the magic that happens at runtime.
19+
# see https://github.com/python/typeshed/blob/master/stdlib/3.7/dataclasses.pyi
1720
def rename(
1821
f: Optional[Field[_T]] = None,
1922
default: Union[_T, _MISSING_TYPE] = MISSING,
2023
name: Optional[str] = None,
2124
ns: Optional[str] = None,
22-
) -> Field[_T]:
25+
) -> _T:
2326
if f is None:
2427
f = make_field(default=default)
2528
metadata = dict(f.metadata)
@@ -28,15 +31,18 @@ def rename(
2831
if ns:
2932
metadata["xml:ns"] = ns
3033
f.metadata = metadata
31-
return f
34+
return f # type: ignore
3235

3336

37+
# NOTE: Actual return type is 'Field[_T]', but we want to help type checkers
38+
# to understand the magic that happens at runtime.
39+
# see https://github.com/python/typeshed/blob/master/stdlib/3.7/dataclasses.pyi
3440
def text(
3541
f: Optional[Field[_T]] = None, default: Union[_T, _MISSING_TYPE] = MISSING
36-
) -> Field[_T]:
42+
) -> _T:
3743
if f is None:
3844
f = make_field(default=default)
3945
metadata = dict(f.metadata)
4046
metadata["xml:text"] = True
4147
f.metadata = metadata
42-
return f
48+
return f # type: ignore

0 commit comments

Comments
 (0)