[jdom-interest] JDOM Exception: invalid XML character ( Unico d e: 0xb) found

Charlie Wu cwu at brocade.com
Tue Jul 23 18:32:13 PDT 2002


Hi Dennis:

Thanks for the response, so 0xb is a valid UTF-8 character but not  a
valid XML character? 

The other question then, is: can I go over my XML file as a character
stream and evaluate them byte by byte and remove anything between 0 and
0x20 (except the 3 you mentioned)? Would this be a problem for UTF-8 because
they could be multi-byted?

Thanks

Charlie

-----Original Message-----
From: Dennis Sosnoski [mailto:dms at sosnoski.com]
Sent: Tuesday, July 23, 2002 4:58 PM
To: Charlie Wu
Cc: 'jdom-interest at jdom.org'
Subject: Re: [jdom-interest] JDOM Exception: invalid XML character (
Unicod e: 0xb) found


The problem is that you have a 0xB character in the data. This is an 
illegal XML character - it doesn't matter what encoding you use, it just 
cannot be present in a legal XML document. All the ASCII control code 
characters are prohibited in XML - of the values below 0x20, only 0x9, 
0xA, and 0xD are allowed (tab, newline, and cr - not necessarily in that 
order).

To make this an XML document you'll need to remove these characters from 
the data. I'm surprised XML spy doesn't complain about this, it's 
something that should be checked by any XML parser.

  - Dennis

Charlie Wu wrote:

>By the way, I created the XML file using UTF-8 encoding.. the code snippet
>is pasted here:
>
>// Get response data.
>
>    InputStreamReader ir = new InputStreamReader(connection.getInputStream
>(), "UTF8");
>	BufferedReader d
>          = new BufferedReader(ir);
>
>    String str;
>//     System.out.println (ir.getEncoding());
>
>
>   Writer out1 = new OutputStreamWriter(System.out, "UTF8");
>   out1.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
>
>    while (null != ((str = d.readLine())))
>    {
>
>	out1.write(str);
>	out1.write("\n");
>
>    //    System.out.println (str);
>
>    }
>	d.close();
>    out1.flush();
>    out1.close();
>
>
>
>-----Original Message-----
>From: Charlie Wu [mailto:cwu at brocade.com]
>Sent: Tuesday, July 23, 2002 12:36 PM
>To: 'jdom-interest at jdom.org'
>Subject: [jdom-interest] JDOM Exception: invalid XML character (Unicode:
>0xb) found
>
>
>Hi all:
>
>Please help me with this.. I've spent 2 days on this with no success.
here's
>my problem:
>
>I run a java program that querys a remote server and stores the result in
an
>XML file
>with encoding set to UTF-8. (I also tried ISO-8859-1 with no success)
>
>The XML loads fine in XML spy without any complaints for illegal characters
>(if I set
>the encoding in the XML header to ISO-8859-1 XML spy would warn about
>characters
>not present in the ISO-8859-1 encoding)
>
>Then I run another java program that trys to use the SAXBuilder to build
>this .. and
>that's where I get the error as followed below.
>
>I have searched the jdom archive and read the FAQ.. but still couldn't get
>it correct :(
>
>Can someone please give me a hand?
>
>Thanks a million
>
>Charlie
>
>Exception eorg.jdom.JDOMException: Error on line 1588 of document
>file:/c:/cwu/cosmos/test2.xml: 
>An invalid XML character (Unicode: 0xb) was found in the element content of
>the document.
>org.jdom.JDOMException: Error on line 1588 of document
>file:/c:/cwu/cosmos/test2.xml: 
>
>An invalid XML character (Unicode: 0xb) was found in the element content of
>the document.
>        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:363)
>        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:707)
>        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:689)
>        at AllegisReader.read(AllegisReader.java:72)
>        at AllegisReader.main(AllegisReader.java:689)
>Caused by: org.xml.sax.SAXParseException: An invalid XML character
(Unicode:
>0xb
>) was found in the element content of the document.
>        at
>org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213
>)
>        at
>org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XM
>LDocumentScanner.java:588)
>        at
>org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.disp
>atch(XMLDocumentScanner.java:1304)
>        at
>org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentS
>canner.java:381)
>        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)
>        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:354)
>        ... 4 more
>Caused by: org.xml.sax.SAXParseException: An invalid XML character
(Unicode:
>0xb
>) was found in the element content of the document.
>        at
>org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213
>_______________________________________________
>To control your jdom-interest membership:
>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourho
s
>t.com
>_______________________________________________
>To control your jdom-interest membership:
>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourho
st.com
>
>  
>





More information about the jdom-interest mailing list