-
-
Notifications
You must be signed in to change notification settings - Fork 52
Description
https://cyclonedx-python-library.readthedocs.io/en/v10.2.0/autoapi/cyclonedx/validation/
We are using both JSON and XML inputs, and when something is wrong with the input, it is not easy to get the location of the problem or even what is wrong can be hidden in a multi-MB message.
One of the problem is, that the underlying libraries make it hard:
jsonschema
includes all the input (instance
) in the error message, which in the SBOM case can be quite big, producing the above mentioned multi-MB message (thisuniqueItems
check can fail on e.g. thedependencies
):yield ValidationError(f"{instance!r} has non-unique elements")
- in the xml case, somehow the easiest solution was to get the error from the logs:
return ValidationError(validator.error_log.last_error)
The other problem is, that CycloneDX makes no attempt at transforming these different object types into something sensible and type-safe for users, the raw objects are simply leaked through the interface as is in
data: Any |
Code samples triggering long messages:
from cyclonedx.validation.json import JsonStrictValidator
from cyclonedx.schema import SchemaVersion
test_data_file = "tests/_data/schemaTestData/1.2/invalid-license-id-1.2.json"
schema_version = SchemaVersion.V1_2
validator = JsonStrictValidator(schema_version)
with open(test_data_file) as tdfh:
test_data = tdfh.read()
validation_error = validator.validate_str(test_data)
print(str(validation_error))
This message is 35508 characters long - 767 lines!
from cyclonedx.validation.xml import XmlValidator
from cyclonedx.schema import SchemaVersion
test_data_file = "tests/_data/schemaTestData/1.1/invalid-license-id-1.1.xml"
schema_version = SchemaVersion.V1_1
validator = XmlValidator(schema_version)
with open(test_data_file) as tdfh:
test_data = tdfh.read()
validation_error = validator.validate_str(test_data)
print(str(validation_error))
This message is 12423 characters long - 1 line.
I would expect the errors returned/raised by CycloneDX something like below:
class ValidationError:
# abstract class
data: Any
"raw problem, for debugging"
path: str
message: str
class XmlValidationError(ValidationError):
# this subclass knows what data is
@property
def path(self):
return self.data.path
@property
def message(self):
return self.data.message
class JsonValidationError(ValidationError):
# this subclass knows what data is
@property
def path(self):
return self.data.json_path
@property
def message(self):
# ensures the error is transformed to something sensible
# resolving a problem caused by using jsonscheme for CycloneDX users
instance = repr(self.data.instance)
return self.data.message.replace(instance, shortened(instance))
# where shortened(long_text) ~ 'first n ... last n', that is the middle of the string replaced
# this would still add some context, but it will be safe to display
These would provide a stable abstraction over generally useful validation error properties, and also hide implementation details from users, like third party objects lxml.etree._LogEntry
and jsonschema.exceptions.ValidationError
. The above proposal is also backward compatible, keeping data
intact, if someone depends on it.