Skip to content

Validation errors are hard to present safely to the user (missing abstraction) #827

@e3krisztian

Description

@e3krisztian

https://cyclonedx-python-library.readthedocs.io/en/v10.2.0/autoapi/cyclonedx/validation/

We are using both JSON and XML inputs, and when something is wrong with the input, it is not easy to get the location of the problem or even what is wrong can be hidden in a multi-MB message.

One of the problem is, that the underlying libraries make it hard:

  • jsonschema includes all the input (instance) in the error message, which in the SBOM case can be quite big, producing the above mentioned multi-MB message (this uniqueItems check can fail on e.g. the dependencies):
            yield ValidationError(f"{instance!r} has non-unique elements")
  • in the xml case, somehow the easiest solution was to get the error from the logs:
    return ValidationError(validator.error_log.last_error)

The other problem is, that CycloneDX makes no attempt at transforming these different object types into something sensible and type-safe for users, the raw objects are simply leaked through the interface as is in

Code samples triggering long messages:

from cyclonedx.validation.json import JsonStrictValidator
from cyclonedx.schema import SchemaVersion

test_data_file = "tests/_data/schemaTestData/1.2/invalid-license-id-1.2.json"
schema_version = SchemaVersion.V1_2
validator = JsonStrictValidator(schema_version)
with open(test_data_file) as tdfh:
    test_data = tdfh.read()
validation_error = validator.validate_str(test_data)
print(str(validation_error))

This message is 35508 characters long - 767 lines!

from cyclonedx.validation.xml import XmlValidator
from cyclonedx.schema import SchemaVersion

test_data_file = "tests/_data/schemaTestData/1.1/invalid-license-id-1.1.xml"
schema_version = SchemaVersion.V1_1

validator = XmlValidator(schema_version)
with open(test_data_file) as tdfh:
    test_data = tdfh.read()
validation_error = validator.validate_str(test_data)
print(str(validation_error))

This message is 12423 characters long - 1 line.


I would expect the errors returned/raised by CycloneDX something like below:

class ValidationError:
    # abstract class
    data: Any
    "raw problem, for debugging"

    path: str
    message: str

class XmlValidationError(ValidationError):
     # this subclass knows what data is
    @property
    def path(self):
        return self.data.path

    @property
    def message(self):
        return self.data.message

class JsonValidationError(ValidationError):
     # this subclass knows what data is
    @property
    def path(self):
        return self.data.json_path

    @property
    def message(self):
        # ensures the error is transformed to something sensible
        # resolving a problem caused by using jsonscheme for CycloneDX users
        instance = repr(self.data.instance)
        return self.data.message.replace(instance, shortened(instance))
        # where shortened(long_text) ~ 'first n ... last n', that is the middle of the string replaced
        # this would still add some context, but it will be safe to display

These would provide a stable abstraction over generally useful validation error properties, and also hide implementation details from users, like third party objects lxml.etree._LogEntry and jsonschema.exceptions.ValidationError. The above proposal is also backward compatible, keeping data intact, if someone depends on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions