Open
Description
I know, I know...if it's bad content it's bad content. But this represents a difference from MRI.
Here's the case, again a reduced version of one I got from @rkh:
# encoding: utf-8
require 'json'
x = "{\"foo\":\"\xC3\"}"
h = JSON.parse(x)
p h['foo']
p h['foo'].encoding
So basically there's a bad byte in a UTF-8 string, and the MRI version walks right by it and allows it to come through to the resulting parsed json structure.
I have a totally broken patch for this:
diff --git a/java/src/json/ext/ByteListTranscoder.java b/java/src/json/ext/ByteListTranscoder.java
index ed9e54b..a7e42ba 100644
--- a/java/src/json/ext/ByteListTranscoder.java
+++ b/java/src/json/ext/ByteListTranscoder.java
@@ -78,9 +78,10 @@ abstract class ByteListTranscoder {
return head;
}
if (head <= 0xbf) { // 0b10xxxxxx
- throw invalidUtf8(); // tail byte with no head
+ return head; //throw invalidUtf8(); // tail byte with no head
}
if (head <= 0xdf) { // 0b110xxxxx
+ if (pos + 1 > srcEnd) return head;
ensureMin(1);
int cp = ((head & 0x1f) << 6)
| nextPart();
Again, I'm not sure this is actually something that needs to be fixed, but because the MRI version of json does not blow up on this content, there's something to be addressed.