Skip to content

Conversation

@emwl
Copy link

@emwl emwl commented Jun 24, 2025

The most common way of switching the encoding is done through ESC sequences in front of the string, and while the specification has a lot of shebang about different character sets in G0 thru G3, with control planes C0/C1 etc., the majority of files I've seen so far (including the ones from the WebCGM Test Suite) simply use DOCS (DECIDE OTHER CODING SYSTEM) as ISO/IEC 2022 and ECMA-35 describe it. At least I haven't seen any files that use CHARACTER SET LIST and CHARACTER SET INDEX so far.

With the first commit, the CGM object keeps track of the currently active encoding, starting out with ISO-8859-1 (as ISO/IEC 8632-1 §6.3.4.5 indicates, and the old code did). The fallback is still there, so even when this fails, we just get the same result as before (which is likely garbage/mojibake that looks the individual bytes instead).

And while trying to verify this with Analyzer, the log output wasn't really useful - so the second commit makes those output files use UTF-8 instead. That way, any multi-byte values (or other encodings that aren't ASCII/ISO-8859-1) show up correctly there while staying mostly the same as before if the output has no multi-byte characters.

I'm not too sure about the third commit, but I noticed this in the WebCGM Test Suite files, especially the ones created with IsoDraw. FONT LIST has a list of String-Fixed values (see ISO/IEC 8632-1 §7.3.13) but those files use the same multi-byte encoding as a regular String would. https://github.com/BhaaLseN/CgmInfo/ also uses a regular string for it.
I don't think there's any downsides to this, because the fallback is still there; just let me know if you want to keep that last commit or not.

Also, a quick disclaimer: I mainly used Analyzer and the text output to test this, I haven't yet tried to render an image with it. But I assume that the drawing routines should do the right thing there.

emwl added 3 commits June 22, 2025 14:43
some metafile generators (such as IsoDraw) write this as regular string,
using the current encoding.
makeString handles this well enough to just switch over to it.
@emwl
Copy link
Author

emwl commented Jun 26, 2025

Took me a bit longer than anticipated, but I finally got around to test this visually:
utf16-japanese-10

Not sure what you think, but I'd argue that's an improvement :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant