[jdom-interest] Special characters not being encoded as UTF-8

Robert Herold rherold at xetus.com
Wed Mar 29 14:53:20 PST 2006


The problem showed up while composing and sending XML between applications
over the network, so System.out.println never figured into my real problem,
just into the test case demonstrating it.  

I understand now, however, that it is simply an output issue.  I'll
investigate how to send true UTF-8 from my obstensibly correct String
representation of the XML.  Thanks for setting me straight, and apologies
for the bother.

(It figures that it would be pilot error - JDOM has been stable for a while,
and I'm a relatively new user...)

-- Robert Herold

-----Original Message-----
From: Paul Libbrecht [mailto:paul at activemath.org] 
Sent: Tuesday, March 28, 2006 11:45 PM
To: Jason Hunter
Cc: Robert Herold; jdom-interest at jdom.org
Subject: Re: [jdom-interest] Special characters not being encoded as UTF-8

System.out.println(string) is a complete killer for anything else than ASCII
since it doesn't make the encoding explicit.

But  System.out is a stream so new
XMLOutputter().output(document,System.out) should do a proper work.

How you see it in the console is yet another challenge, btw!
Try first to pipe the output of the process to a file then see it with
various encodings.

paul

Jason Hunter wrote:
> XMLOutputter does output as UTF-8 unless you dictate otherwise, but 
> you're asking the outputter to return a String.  So it does, and 
> Strings in Java are just a sequence of characters (they have no 
> associated byte encoding).  Then when you print that string with 
> System.out you're dropping into your system's native charset which 
> probably isn't UTF-8.
>
> Bottom line, you're printing a String using System.out which isn't
> UTF-8 friendly.  XMLOutputter did the proper job returning an abstract 
> String representation of the chars.
>
> -jh-
>
> Robert Herold wrote:
>> I'm trying to produce XML with special characters (e.g. ascii 0xA7, 
>> which is the section-sign) in the text content of an element.  I 
>> would expect XMLOutputter to encode these characters as UTF-8, but it 
>> doesn't.
>> How do I
>> get it to encode the special characters as UTF-8?  Or do I have to 
>> encode them before adding to the document?
>>
>> Consider this test program:
>>
>> import org.jdom.Document;
>> import org.jdom.Element;
>> import org.jdom.input.SAXBuilder;
>> import org.jdom.output.XMLOutputter;
>>
>> public class OutputXML {
>>     private static String SECTION_SIGN = "§";
>>
>>     public static void main(String[] args) {
>>
>>         Document doc1 = new Document();
>>         Element elem = new Element("elem");
>>         doc1.setRootElement(elem);
>>         elem.addContent(SECTION_SIGN);
>>
>>         XMLOutputter outputter = new XMLOutputter();
>>         String text = outputter.outputString(doc1);
>>         System.out.println(text);
>>     }
>> }
>>
>> It produces the output:
>>
>> <?xml version="1.0" encoding="UTF-8"?> <elem>§</elem>
>>
>> In a hex-dump of the output, one can see that the section-sign is 
>> left as
>> 0xA7 (at offset 0x2e in the output), instead of being UTF-8 encoded:
>>
>> 000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31  ><?xml 
>> version="1< 000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54  
>> >.0"
>> encoding="UT<
>> 000020 46 2d 38 22 3f 3e 0d 0a 3c 65 6c 65 6d 3e a7 3c
>> >F-8"?>..<elem>.<<
>> 000030 2f 65 6c 65 6d 3e 0d 0a 0d 0a                    >/elem>....<
>>
>> Shouldn't XMLOutputter encode this character as UTF-8?
>>
>> Thanks for any insights, and forgive me if this is answered elsewhere
>> - I
>> couldn't find it in a morning of searching!
>>
>> -- Robert Herold
>>
>>
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.c
>> om
>>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.co
> m
>







More information about the jdom-interest mailing list