[jdom-interest] UTF8 charset issues...

Fri Oct 10 05:35:20 PDT 2003

Hi all,

I am trying to understand how jdom handles character encodings. Here is 
what I am doing:

I have a java app which reads data from a xml file (UTF-8 encoded). I 
am able to get text just fine using
String str = anElement.getText();

The resulting str string (Unicode encoded) contains exactly what was 
defined in my xml file. The charset translation is here transparent for 
me. For example if my xml document is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DOCUMENT SYSTEM "annonce.dtd">
<DOCUMENT>
     <TEXT>Æ</TEXT>
</DOCUMENT>

I get Æ in my str string.

However when I am trying to generate a xml document with this exact 
same Æ value, just calling Element.setText("Æ") does not generate a 
correct UTF-8 encoded document. I have first to manually do this in my 
code:
		String text = "Æ";
		try{
			byte[] bytes = text.getBytes("UTF8");
			String newText = new String(bytes);
			setText(newText);
		}catch(UnsupportedEncodingException uee){
			uee.printStackTrace();
		}

Why do I have to do this for the xml generation to work. Why isn't jdom 
taking care of the charset translation for me since the resulting file 
has UTF-8 encoding specified in it?

Thanks for any help

Patrick