Skip to content

fix: check-yml encoding error #1157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mircodariol
Copy link

I noticed that official check_yaml hook throws a UnicodeDecodeError in case of invalid file encoding without providing any details about the invalid yaml file. This makes it very difficult for the developer to figure out which file is causing the problem.

Example of error stack-trace:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Scripts\check-yaml.EXE\__main__.py", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Lib\site-packages\pre_commit_hooks\check_yaml.py", line 64, in main
    load_fn(f)
    ~~~~~~~^^^
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Lib\site-packages\ruamel\yaml\main.py", line 449, in load
    constructor, parser = self.get_constructor_parser(stream)
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Lib\site-packages\ruamel\yaml\main.py", line 500, in get_constructor_parser
    self.reader.stream = stream
    ^^^^^^^^^^^^^^^^^^
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Lib\site-packages\ruamel\yaml\reader.py", line 120, in stream
    self.determine_encoding()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Lib\site-packages\ruamel\yaml\reader.py", line 174, in determine_encoding
    self.update_raw()
    ~~~~~~~~~~~~~~~^^
  File "C:\Users\*******\.cache\pre-commit\repo5cis_pzf\py_env-python3.13\Lib\site-packages\ruamel\yaml\reader.py", line 263, in update_raw
    data = self.stream.read(size)
  File "<frozen codecs>", line 325, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 1665: invalid start byte

This PR improves the reliability of the check_yaml hook by ensuring it handles also YAML containing characters that violates file encoding.
In case of UnicodeDecodeError the hook notify the error and prints a simple message to indicate the file path that generate the error.
The same behaviour applies also in case of ruamel.yaml.YAMLError (obviously with a different error message).

I made also a new unit test to complete the test coverage over my new changes.

Please, take a look on my little changes.
I hope this has been helpful and I look forward to hearing your feedback.

@asottile
Copy link
Member

I'm not sure this improves things and it changes an unrelated thing at the same time.

@mircodariol
Copy link
Author

But at least it indicates you which file needs to be checked.
Some days ago one of my coworkers has been blocked in this issue, and has asked for help to understand the root cause of the error because he was not able to undestand where was the problem. I think that if he had found a clearer message about the wrong file, he could have fixed the problem on his own.

What would you suggest instead? I am open to any other solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants