Add support for UTF-8/UTF-16 strings through DOCS #18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

The most common way of switching the encoding is done through ESC sequences in front of the string, and while the specification has a lot of shebang about different character sets in G0 thru G3, with control planes C0/C1 etc., the majority of files I've seen so far (including the ones from the WebCGM Test Suite) simply use DOCS (DECIDE OTHER CODING SYSTEM) as ISO/IEC 2022 and ECMA-35 describe it. At least I haven't seen any files that use
CHARACTER SET LISTandCHARACTER SET INDEXso far.With the first commit, the
CGMobject keeps track of the currently active encoding, starting out with ISO-8859-1 (as ISO/IEC 8632-1 §6.3.4.5 indicates, and the old code did). The fallback is still there, so even when this fails, we just get the same result as before (which is likely garbage/mojibake that looks the individual bytes instead).And while trying to verify this with
Analyzer, the log output wasn't really useful - so the second commit makes those output files use UTF-8 instead. That way, any multi-byte values (or other encodings that aren't ASCII/ISO-8859-1) show up correctly there while staying mostly the same as before if the output has no multi-byte characters.I'm not too sure about the third commit, but I noticed this in the WebCGM Test Suite files, especially the ones created with IsoDraw.
FONT LISThas a list of String-Fixed values (see ISO/IEC 8632-1 §7.3.13) but those files use the same multi-byte encoding as a regular String would. https://github.com/BhaaLseN/CgmInfo/ also uses a regular string for it.I don't think there's any downsides to this, because the fallback is still there; just let me know if you want to keep that last commit or not.
Also, a quick disclaimer: I mainly used
Analyzerand the text output to test this, I haven't yet tried to render an image with it. But I assume that the drawing routines should do the right thing there.