Skip to content

Commit

Permalink
Refactor format validation
Browse files Browse the repository at this point in the history
ref #35
  • Loading branch information
marksparkza committed Aug 21, 2022
1 parent 84252c4 commit e504884
Show file tree
Hide file tree
Showing 10 changed files with 144 additions and 221 deletions.
4 changes: 2 additions & 2 deletions docs/examples/format_validation.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Format validation
=================
The script below demonstrates the implementation of format validators
for the ``"ipv4"``, ``"ipv6"`` and ``"hostname"`` format attributes.
In this example we register and enable validators for the ``ipv4``
and ``ipv6`` formats.

.. literalinclude:: ../../examples/format_validation.py

Expand Down
61 changes: 9 additions & 52 deletions docs/guide/catalog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,58 +68,15 @@ directly from the catalog:
See :doc:`../examples/file_based_schemas` for further examples of loading
schemas from disk.

Format validators
Format validation
-----------------
jschon does not provide built-in support for validating JSON Schema
`formats <https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.7.3>`_.
By default, any occurrence of the ``"format"`` keyword in a schema simply passes,
with its value -- its *format attribute* -- collected as an annotation.

To validate a given format attribute, we can define a *format validator*.

The :meth:`~jschon.catalog.Catalog.add_format_validators` method accepts a
dictionary of :class:`~jschon.vocabulary.format.FormatValidator` objects indexed
by format attribute. A :class:`~jschon.vocabulary.format.FormatValidator`
is simply a callable that accepts a single argument -- the value to be validated --
and raises a :exc:`ValueError` if a supplied value is invalid.

For example, suppose that we'd like to validate that any occurrence of an IP address
or hostname in a JSON document conforms to the ``"ipv4"``, ``"ipv6"`` or ``"hostname"``
format. For the IP address formats, we can use the :class:`ipaddress.IPv*Address`
classes, available in the Python standard library, since their constructors raise
a :exc:`ValueError` for an invalid constructor argument. For the hostname format,
we'll define a validation function using a hostname `regex <https://stackoverflow.com/a/106223>`_.
Our catalog setup looks like this:

>>> import ipaddress
>>> import re
>>> from jschon import Catalog
...
>>> def validate_hostname(value):
... hostname_regex = re.compile(r"^(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])$")
... if not hostname_regex.match(value):
... raise ValueError(f"'{value}' is not a valid hostname")
...
>>> catalog = create_catalog('2020-12')
>>> catalog.add_format_validators({
... "ipv4": ipaddress.IPv4Address,
... "ipv6": ipaddress.IPv6Address,
... "hostname": validate_hostname,
... })
By default, formats are not validated in jschon. Any occurrence of the ``format``
keyword simply produces an annotation consisting of the keyword's value, called
the *format attribute*.

Now, we can define a schema that returns a validation failure for any JSON document
that contains incorrectly formatted IP addresses or hostnames. The following
simple example validates a single string instance:
Format validators can be registered using the :func:`~jschon.vocabulary.format.format_validator`
decorator. Format attributes must, however, be explicitly enabled for validation
in the catalog, in order to use any registered format validator. This can be done
using :meth:`~jschon.catalog.Catalog.enable_formats`.

>>> from jschon import JSONSchema
>>> schema = JSONSchema({
... "$schema": "https://json-schema.org/draft/2020-12/schema",
... "type": "string",
... "anyOf": [
... {"format": "ipv4"},
... {"format": "ipv6"},
... {"format": "hostname"}
... ]
... })

For a complete working example, see :doc:`../examples/format_validation`.
For a working example, see :doc:`../examples/format_validation`.
3 changes: 3 additions & 0 deletions docs/reference/formats.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
jschon.formats
==============
.. automodule:: jschon.formats
87 changes: 32 additions & 55 deletions examples/format_validation.py
Original file line number Diff line number Diff line change
@@ -1,77 +1,54 @@
import ipaddress
import pprint
import re

from jschon import create_catalog, JSON, JSONSchema
from jschon import JSON, JSONSchema, create_catalog
from jschon.vocabulary.format import format_validator


# define a "hostname" format validation function
def validate_hostname(value):
hostname_regex = re.compile(
r"^(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])$")
if not hostname_regex.match(value):
raise ValueError(f"'{value}' is not a valid hostname")
# register an 'ipv4' format validator
@format_validator('ipv4')
def validate_ipv4(value: str) -> None:
if isinstance(value, str):
ipaddress.IPv4Address(value) # raises ValueError for an invalid IPv4 address


# create a catalog with support for JSON Schema version 2020-12
# register an 'ipv6' format validator
@format_validator('ipv6')
def validate_ipv6(value: str) -> None:
if isinstance(value, str):
ipaddress.IPv6Address(value) # raises ValueError for an invalid IPv6 address


# initialize the catalog, with JSON Schema 2020-12 vocabulary support
catalog = create_catalog('2020-12')

# register IP address and hostname format validators
catalog.add_format_validators({
"ipv4": ipaddress.IPv4Address,
"ipv6": ipaddress.IPv6Address,
"hostname": validate_hostname,
})
# enable validation with the 'ipv4' and 'ipv6' format validators
catalog.enable_formats('ipv4', 'ipv6')

# create a schema for validating an array of host records
hosts_schema = JSONSchema({
# create a schema for validating an array of IP addresses
schema = JSONSchema({
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/hosts-schema",
"$id": "https://example.com/schema",
"type": "array",
"items": {
"type": "object",
"properties": {
"ipaddress": {
"type": "string",
"oneOf": [
{"format": "ipv4"},
{"format": "ipv6"}
]
},
"hostname": {
"type": "string",
"format": "hostname"
}
},
"required": ["ipaddress", "hostname"]
"type": "string",
"anyOf": [
{"format": "ipv4"},
{"format": "ipv6"}
]
}
})

# declare a host record array containing valid IP addresses and hostnames
valid_host_records = JSON([
{"ipaddress": "127.0.0.1", "hostname": "localhost"},
{"ipaddress": "10.0.0.8", "hostname": "server.local"},
])

# declare a host record array containing some values that are invalid
# per the registered format validators
invalid_host_records = JSON([
{"ipaddress": "127.0.0.1", "hostname": "~localhost"},
{"ipaddress": "10.0.0", "hostname": "server.local"},
])

# evaluate the valid array
valid_result = hosts_schema.evaluate(valid_host_records)
# evaluate a valid array
valid_result = schema.evaluate(JSON(['127.0.0.1', '::1']))

# evaluate the invalid array
invalid_result = hosts_schema.evaluate(invalid_host_records)
# evaluate an invalid array
invalid_result = schema.evaluate(JSON(['127.0.1', '::1']))

# print output for the valid case
print(f'Valid array result: {valid_result.valid}')
print('Valid array basic output:')
print('Valid case output:')
pprint.pp(valid_result.output('basic'))

# print output for the invalid case
print(f'Invalid array result: {invalid_result.valid}')
print('Invalid array detailed output:')
pprint.pp(invalid_result.output('detailed'))
print('Invalid case output:')
pprint.pp(invalid_result.output('basic'))
77 changes: 25 additions & 52 deletions examples/output/format_validation.txt
Original file line number Diff line number Diff line change
@@ -1,62 +1,35 @@
Valid array result: True
Valid array basic output:
Valid case output:
{'valid': True,
'annotations': [{'instanceLocation': '',
'keywordLocation': '/items',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items',
'absoluteKeywordLocation': 'https://example.com/schema#/items',
'annotation': True},
{'instanceLocation': '/0',
'keywordLocation': '/items/properties',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties',
'annotation': ['ipaddress', 'hostname']},
{'instanceLocation': '/0/ipaddress',
'keywordLocation': '/items/properties/ipaddress/oneOf/0/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/ipaddress/oneOf/0/format',
'keywordLocation': '/items/anyOf/0/format',
'absoluteKeywordLocation': 'https://example.com/schema#/items/anyOf/0/format',
'annotation': 'ipv4'},
{'instanceLocation': '/0/hostname',
'keywordLocation': '/items/properties/hostname/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/hostname/format',
'annotation': 'hostname'},
{'instanceLocation': '/1',
'keywordLocation': '/items/properties',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties',
'annotation': ['ipaddress', 'hostname']},
{'instanceLocation': '/1/ipaddress',
'keywordLocation': '/items/properties/ipaddress/oneOf/0/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/ipaddress/oneOf/0/format',
'annotation': 'ipv4'},
{'instanceLocation': '/1/hostname',
'keywordLocation': '/items/properties/hostname/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/hostname/format',
'annotation': 'hostname'}]}
Invalid array result: False
Invalid array detailed output:
'keywordLocation': '/items/anyOf/1/format',
'absoluteKeywordLocation': 'https://example.com/schema#/items/anyOf/1/format',
'annotation': 'ipv6'}]}
Invalid case output:
{'valid': False,
'instanceLocation': '',
'keywordLocation': '',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#',
'errors': [{'instanceLocation': '',
'keywordLocation': '/items',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items',
'errors': [{'instanceLocation': '/0/hostname',
'keywordLocation': '/items/properties/hostname/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/hostname/format',
'error': 'The instance is invalid against the '
'"hostname" format: \'~localhost\' is not a '
'valid hostname'},
{'instanceLocation': '/1/ipaddress',
'keywordLocation': '/items/properties/ipaddress/oneOf',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/ipaddress/oneOf',
'errors': [{'instanceLocation': '/1/ipaddress',
'keywordLocation': '/items/properties/ipaddress/oneOf/0/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/ipaddress/oneOf/0/format',
'error': 'The instance is invalid against '
'the "ipv4" format: Expected 4 '
"octets in '10.0.0'"},
{'instanceLocation': '/1/ipaddress',
'keywordLocation': '/items/properties/ipaddress/oneOf/1/format',
'absoluteKeywordLocation': 'https://example.com/hosts-schema#/items/properties/ipaddress/oneOf/1/format',
'error': 'The instance is invalid against '
'the "ipv6" format: At least 3 '
'parts expected in '
"'10.0.0'"}]}]}]}
'absoluteKeywordLocation': 'https://example.com/schema#/items',
'error': [0]},
{'instanceLocation': '/0',
'keywordLocation': '/items/anyOf',
'absoluteKeywordLocation': 'https://example.com/schema#/items/anyOf',
'error': 'The instance must be valid against at least one '
'subschema'},
{'instanceLocation': '/0',
'keywordLocation': '/items/anyOf/0/format',
'absoluteKeywordLocation': 'https://example.com/schema#/items/anyOf/0/format',
'error': 'The instance is invalid against the "ipv4" format: '
"Expected 4 octets in '127.0.1'"},
{'instanceLocation': '/0',
'keywordLocation': '/items/anyOf/1/format',
'absoluteKeywordLocation': 'https://example.com/schema#/items/anyOf/1/format',
'error': 'The instance is invalid against the "ipv6" format: At '
"least 3 parts expected in '127.0.1'"}]}
42 changes: 14 additions & 28 deletions jschon/catalog/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
import pathlib
import uuid
from contextlib import contextmanager
from importlib import import_module
from os import PathLike
from typing import Any, ContextManager, Dict, Hashable, Mapping, Union
from typing import Any, ContextManager, Dict, Hashable, Set, Union

from jschon.exceptions import CatalogError, JSONPointerError, URIError
from jschon.json import JSONCompatible
Expand All @@ -13,7 +14,6 @@
from jschon.uri import URI
from jschon.utils import json_loadf, json_loadr
from jschon.vocabulary import KeywordClass, Metaschema, Vocabulary
from jschon.vocabulary.format import FormatValidator

__all__ = [
'Catalog',
Expand Down Expand Up @@ -85,8 +85,8 @@ def __init__(self, name: str = 'catalog') -> None:

self._uri_sources: Dict[URI, Source] = {}
self._vocabularies: Dict[URI, Vocabulary] = {}
self._format_validators: Dict[str, FormatValidator] = {}
self._schema_cache: Dict[Hashable, Dict[URI, JSONSchema]] = {}
self._enabled_formats: Set[str] = set()

def add_uri_source(self, base_uri: URI, source: Source):
"""Register a source for URI-identified JSON resources.
Expand Down Expand Up @@ -196,33 +196,19 @@ def create_metaschema(
if not metaschema.validate().valid:
raise CatalogError("The metaschema is invalid against itself")

def add_format_validators(self, validators: Mapping[str, FormatValidator]) -> None:
"""Register a collection of format validators.
In jschon, a given occurrence of the ``"format"`` keyword evaluates
a JSON instance using a format validation callable, if one has been
registered for the applicable *format attribute* (the keyword's value).
If a validator has not been registered for that format attribute,
keyword evaluation simply passes.
:param validators: a dictionary of :class:`~jschon.vocabulary.format.FormatValidator`
callables, keyed by format attribute
"""
self._format_validators.update(validators)

def get_format_validator(self, format_attr: str) -> FormatValidator:
"""Get a registered :class:`~jschon.vocabulary.format.FormatValidator`
function.
def enable_formats(self, *format_attr: str) -> None:
"""Enable validation of the specified format attributes.
:param format_attr: the format attribute (``"format"`` keyword value)
to which the validator applies
:raise CatalogError: if no format validator is registered for the
given `format_attr`
These may include formats defined in :mod:`jschon.formats`
and elsewhere.
"""
try:
return self._format_validators[format_attr]
except KeyError:
raise CatalogError(f"Unsupported format attribute '{format_attr}'")
import_module('jschon.formats')
self._enabled_formats |= set(format_attr)

def is_format_enabled(self, format_attr) -> bool:
"""Return True if validation is enabled for `format_attr`,
False otherwise."""
return format_attr in self._enabled_formats

def add_schema(
self,
Expand Down
9 changes: 9 additions & 0 deletions jschon/formats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from jschon.jsonpointer import JSONPointer
from jschon.vocabulary.format import format_validator


@format_validator('json-pointer')
def validate_json_pointer(value: str) -> None:
if isinstance(value, str):
if not JSONPointer._json_pointer_re.fullmatch(value):
raise ValueError
Loading

0 comments on commit e504884

Please sign in to comment.