[jdom-interest] Special characters not being encoded as UTF-8

Robert Herold rherold at xetus.com
Tue Mar 28 12:18:44 PST 2006


I'm trying to produce XML with special characters (e.g. ascii 0xA7, which is
the section-sign) in the text content of an element.  I would expect
XMLOutputter to encode these characters as UTF-8, but it doesn't.  How do I
get it to encode the special characters as UTF-8?  Or do I have to encode
them before adding to the document?

Consider this test program:

import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;

public class OutputXML {
	private static String SECTION_SIGN = "§";

	public static void main(String[] args) {

		Document doc1 = new Document();
		Element elem = new Element("elem");
		doc1.setRootElement(elem);
		elem.addContent(SECTION_SIGN);

		XMLOutputter outputter = new XMLOutputter();
		String text = outputter.outputString(doc1);
		System.out.println(text);
	}
}

It produces the output:

<?xml version="1.0" encoding="UTF-8"?>
<elem>§</elem>

In a hex-dump of the output, one can see that the section-sign is left as
0xA7 (at offset 0x2e in the output), instead of being UTF-8 encoded:

000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31  ><?xml version="1<
000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54  >.0" encoding="UT<
000020 46 2d 38 22 3f 3e 0d 0a 3c 65 6c 65 6d 3e a7 3c  >F-8"?>..<elem>.<<
000030 2f 65 6c 65 6d 3e 0d 0a 0d 0a                    >/elem>....<

Shouldn't XMLOutputter encode this character as UTF-8?

Thanks for any insights, and forgive me if this is answered elsewhere - I
couldn't find it in a morning of searching!

-- Robert Herold





More information about the jdom-interest mailing list