You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the attached UTF-8 text file (Unicode-Test.txt) is imported into CorefAnnotator and then saved, the attached XMI file is generated (Unicode-Test-xmi.txt, originally Unicode-Test.xmi, but GitHub does not allow me to upload .xmi files), which in turn cannot be opened again:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2672; Character reference "�" is an invalid XML character.
(The same error occurs when trying to load that file in a different program with Java’s SAX parser for XML.)
There is only one Unicode character in the text file: 😂 U+1F602 FACE WITH TEARS OF JOY
This character is displayed correctly in the editor window after importing the text file; just saving it does not seem to work. Judging from the column number given in the error message, the problem lies in the sofaString of the following sofa:
Since U+1F602 is a code point outside the Basic Multilingual Plane (BMP), Java’s internal String representation (UTF-16) needs two chars to represent it. It looks like those two chars are escaped individually, which seems to be invalid in XML.
When using Java’s javax.xml.transform.Transformer to create an XML file for a org.w3c.dom.Document where the value of an attribute is set to U+1F602 (that is, to "\uD83D\uDE02"), that attribute value becomes "😂", so I think the above sofa should look like this:
Occurred in this release of CorefAnnotator with Java 13; the javax.xml.transform.Transformer test program delivered the above-mentioned output both when run with Java 13 and when run with Java 8.
Full stack trace of the exception:
java.io.IOException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2672; Character reference "�" is an invalid XML character.
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
at javax.swing.SwingWorker.get(SwingWorker.java:613) ~[?:?]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.done(JCasLoader.java:147) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at javax.swing.SwingWorker$5.run(SwingWorker.java:750) ~[?:?]
at javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:847) ~[?:?]
at sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112) ~[?:?]
at javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:857) ~[?:?]
at javax.swing.Timer.fireActionPerformed(Timer.java:317) ~[?:?]
at javax.swing.Timer$DoPostEvent.run(Timer.java:249) ~[?:?]
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:313) ~[?:?]
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:770) ~[?:?]
at java.awt.EventQueue$4.run(EventQueue.java:721) ~[?:?]
at java.awt.EventQueue$4.run(EventQueue.java:715) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:391) [?:?]
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85) [?:?]
at java.awt.EventQueue.dispatchEvent(EventQueue.java:740) [?:?]
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) [?:?]
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) [?:?]
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) [?:?]
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) [?:?]
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) [?:?]
at java.awt.EventDispatchThread.run(EventDispatchThread.java:90) [?:?]
Caused by: java.io.IOException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2672; Character reference "�" is an invalid XML character.
at de.unistuttgart.ims.coref.annotator.plugins.DefaultImportPlugin.getJCas(DefaultImportPlugin.java:87) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.readFile(JCasLoader.java:104) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.doInBackground(JCasLoader.java:139) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.doInBackground(JCasLoader.java:33) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at javax.swing.SwingWorker$1.call(SwingWorker.java:304) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at javax.swing.SwingWorker.run(SwingWorker.java:343) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
Caused by: org.xml.sax.SAXParseException: Character reference "�" is an invalid XML character.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:2066) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1983) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.plugins.DefaultImportPlugin.getJCas(DefaultImportPlugin.java:84) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.readFile(JCasLoader.java:104) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.doInBackground(JCasLoader.java:139) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at de.unistuttgart.ims.coref.annotator.worker.JCasLoader.doInBackground(JCasLoader.java:33) ~[CorefAnnotator-1.14.3-full.jar:1.14.3]
at javax.swing.SwingWorker$1.call(SwingWorker.java:304) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at javax.swing.SwingWorker.run(SwingWorker.java:343) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
Some googling suggests that many people experience similar problems due to this bug in Xalan. Does your code use Xalan (maybe indirectly through UIMA)?
When the attached UTF-8 text file (
Unicode-Test.txt
) is imported into CorefAnnotator and then saved, the attached XMI file is generated (Unicode-Test-xmi.txt
, originallyUnicode-Test.xmi
, but GitHub does not allow me to upload.xmi
files), which in turn cannot be opened again:(The same error occurs when trying to load that file in a different program with Java’s SAX parser for XML.)
There is only one Unicode character in the text file: 😂 U+1F602 FACE WITH TEARS OF JOY
This character is displayed correctly in the editor window after importing the text file; just saving it does not seem to work. Judging from the column number given in the error message, the problem lies in the
sofaString
of the followingsofa
:Since U+1F602 is a code point outside the Basic Multilingual Plane (BMP), Java’s internal
String
representation (UTF-16) needs twochar
s to represent it. It looks like those twochar
s are escaped individually, which seems to be invalid in XML.When using Java’s
javax.xml.transform.Transformer
to create an XML file for aorg.w3c.dom.Document
where the value of an attribute is set to U+1F602 (that is, to"\uD83D\uDE02"
), that attribute value becomes"😂"
, so I think the abovesofa
should look like this:Occurred in this release of CorefAnnotator with Java 13; the
javax.xml.transform.Transformer
test program delivered the above-mentioned output both when run with Java 13 and when run with Java 8.Full stack trace of the exception:
Unicode-Test.txt
Unicode-Test-xmi.txt
The text was updated successfully, but these errors were encountered: