Skip to content

Conversation

harumaki4649
Copy link

Summary

Fixed file reading error in utils.py by explicitly specifying UTF-8 encoding.

Details

  • Added encoding="utf-8" to file open function
  • Prevents errors on environments with different default encodings
  • Ensures consistent behavior across platforms

Impact

  • No functional changes except avoiding UnicodeDecodeError
  • Safer and more predictable file reading

@sigmavirus24
Copy link
Member

Can you provide an example of when this fails today and the complete traceback?

@harumaki4649
Copy link
Author

Yes, this is an error that occurs in Japanese environments.
It's likely due to handling Japanese text.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\Python311\Scripts\twine.exe\__main__.py", line 7, in <module>
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\__main__.py", line 33, in main
    error = cli.dispatch(sys.argv[1:])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\cli.py", line 139, in dispatch
    return main(args.args)
           ^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\commands\upload.py", line 250, in main
    upload_settings = settings.Settings.from_argparse(parsed_args)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 288, in from_argparse
    return cls(**settings)
           ^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 116, in __init__
    self._handle_repository_options(
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 304, in _handle_repository_options
    self.repository_config = utils.get_repository_from_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\utils.py", line 154, in get_repository_from_config
    config = get_config(config_file)[repository]
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\utils.py", line 66, in get_config
    parser.read_file(f)
  File "C:\Program Files\Python311\Lib\configparser.py", line 734, in read_file
    self._read(f, source)
  File "C:\Program Files\Python311\Lib\configparser.py", line 1037, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'cp932' codec can't decode byte 0x88 in position 418: illegal multibyte sequence```

@harumaki4649
Copy link
Author

I made minimal changes based on this error.
Specifically, I specified the encoding.
I believe utf-8 is optimal in most cases.

@harumaki4649
Copy link
Author

Specifically, it is as follows.

This error happens in Windows Japanese environments where the default encoding is cp932.
By explicitly specifying UTF-8, we avoid platform-dependent behavior and ensure consistent handling of .pypirc.
UTF-8 has become the de-facto standard for configuration files, so I believe this change is safe and beneficial.

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @harumaki4649! It's a bummer that this kind of system codec stuff still causes issues on Windows hosts, but I see no problem with explicitly requiring UTF-8 in the config here based on your explanation.

@sigmavirus24
Copy link
Member

I've only been hesitant as I expect there may be some person somewhere using utf-16/utf-32 as rare as that is for whom this would break

@woodruffw
Copy link
Member

I've only been hesitant as I expect there may be some person somewhere using utf-16/utf-32 as rare as that is for whom this would break

Ah yeah, that's a good point. Maybe what we could do here is attempt the read twice: once as the native codepage and then again with UTF-8 if the first fails?

(Or do the read once, as bytes, and attempt decoding from bytes twice.)

@sigmavirus24
Copy link
Member

Yeah, that just feels not great in both cases

@harumaki4649
Copy link
Author

@woodruffw Should we change it to use UTF-8 as a fallback when errors occur?

@sigmavirus24
Copy link
Member

@harumaki4649 yes. I'd catch a specific exception and then retry with utf-8

@harumaki4649
Copy link
Author

harumaki4649 commented Oct 2, 2025

I’ve applied the improvements based on your review comments.
Could you please take another look and let me know if further adjustments are needed?

@harumaki4649 harumaki4649 requested a review from woodruffw October 3, 2025 10:37
Refactor configuration file parsing to use helper functions for better error handling and readability.
Refactor error handling for configuration file parsing.
@harumaki4649
Copy link
Author

Based on the feedback we received again, we have made further revisions.
Please review it once more.

Copy link
Member

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all we could use are some tests to trip this.

@sigmavirus24
Copy link
Member

@harumaki4649 really all you need to do is provide an ini file that has a character outside of cp932 and in pytest use https://docs.python.org/3/library/locale.html to try to override the default behaviour in the test.

@harumaki4649
Copy link
Author

I ran the tests in my local environment. I'm not entirely sure whether these are the tests you expected, but would this be acceptable? This was my first time performing this type of task, so I apologize in advance if I made any mistakes. Is there anything else you would like me to check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants