Skip to content

Do not issue warning when calling set_encoding if string is chilled #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jeremyevans
Copy link
Contributor

StringIO does not warn for unchilled unfrozen string or for frozen string, so it should not warn for chilled string.

StringIO does not warn for unchilled unfrozen string or for frozen
string, so it should not warn for chilled string.
Co-authored-by: Sutou Kouhei <[email protected]>
@tomhughes
Copy link

Doesn't this change the semantics of StringIO when used with what is (for now) not actually a frozen string?

As I understand it currently if you're only reading the string the encoding is used to determine how to interpret the data but the encoding of the underlying string is not changed, but if you're writing then the encoding of the underlying string is changed, which will be important if you then try and use the string you've filled via the StringIO handle.

With this change if you use StringIO to write to a constant, but non-frozen, string literal (which is entirely valid for now, if deprecated) then the encoding will be used to do the write but the resulting string will not have the right encoding.

At the very least it sounds like it would be a semver major change?

@jeremyevans
Copy link
Contributor Author

Doesn't this change the semantics of StringIO when used with what is (for now) not actually a frozen string?

With this change, StringIO#set_encoding will no longer attempt to change the encoding of the underlying string for chilled strings. I don't consider that changing the semantics, because this StringIO behavior is not mentioned in the documentation, it is an implementation detail. As StringIO already does not do this for frozen strings, not doing it for chilled strings seems reasonable. After all, when Ruby changes literal strings from chilled by default to frozen by default, this will automatically happen anyway, so this just does so a little earlier to avoid the deprecation warning.

As I understand it currently if you're only reading the string the encoding is used to determine how to interpret the data but the encoding of the underlying string is not changed, but if you're writing then the encoding of the underlying string is changed, which will be important if you then try and use the string you've filled via the StringIO handle.

With this change if you use StringIO to write to a constant, but non-frozen, string literal (which is entirely valid for now, if deprecated) then the encoding will be used to do the write but the resulting string will not have the right encoding.

Note that writing to a StringIO backed by a chilled string is still deprecated with this commit. The only change is when you get the deprecation warning. Instead of getting it in set_encoding, you get it when you try to write to the string:

require 'stringio'
Warning[:deprecated] = true

x = "\u1234"
s = StringIO.new(x)
p x.encoding
s.set_encoding("BINARY")
p x.encoding
p x.valid_encoding?
s << "\x89"
p x.valid_encoding?

output:

#<Encoding:UTF-8>
#<Encoding:UTF-8>
true
t.rb:10: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
false

If you only read and call set_encoding on the StringIO, then you no longer get a warning, just as you would not get a warning for a frozen string.

You are correct that the encoding of the string becomes invalid. However, StringIO will modify strings in ways that result in invalid encodings even when it does change the encoding of the underlying string:

require 'stringio'

Warning[:deprecated] = true
x = "\u1234".b
s = StringIO.new(x)
p x.encoding
s.set_encoding("UTF-8")
p x.encoding
p x.valid_encoding?
s << "\x89"
p x.valid_encoding?

output:

#<Encoding:BINARY (ASCII-8BIT)>
#<Encoding:UTF-8>
true
false

At the very least it sounds like it would be a semver major change?

If StringIO raised an error if asked to write a string with an invalid encoding, then I guess you could argue that. However, since it does not, and misuse of StringIO can result in invalid string encodings even in the case where it changes the encoding of the underlying string, this doesn't sound like a semver major change to me.

@kou
Copy link
Member

kou commented Apr 21, 2025

@tomhughes Do you have any more opinions before we merge this?

@rhenium
Copy link
Member

rhenium commented Apr 21, 2025

FWIW, StringIO#set_encoding_by_bom and StringIO#binmode similarly set the encoding of the underlying string.

The current behavior of StringIO#set_encoding comes from [Bug #11827]. NEWS for Ruby 2.3 says a bit more detail:

In read-only mode, StringIO#set_encoding no longer sets the encoding of its buffer string. Setting the encoding of the string directly without StringIO#set_encoding may cause unpredictable behavior now. [Bug #11827]

Since StringIO backed by a chilled string allows writing, my interpretation is that calling #set_encoding should also change the encoding of the underlying string. The warning is unfortunate, but it seems correct to me in this case.

@tomhughes
Copy link

@tomhughes Do you have any more opinions before we merge this?

I'm no expert so I'm happy to defer to others - was just trying to make sure people had considered this and whether it was reasonable or would break people's expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants