[jdom-interest] XMLOutputter and utf-8

Chris Curvey ccurvey at gmail.com
Fri May 20 06:39:52 PDT 2005


Thanks to Jason & Paul for their responses. I tried Jason's suggestion for 
my example, and it works great. (And I realize that this question is 
increasingly off-topic, please forgive me.)

In my real-world problem, I'm not writing to System.out, I'm writing to an 
output stream returned from an HttpsURLConnection. So I tried this:

Document doc = getXML();
XMLOutputter out = new XMLOutputter();
out.setEncoding("UTF-8");
String renderedDoc = out.outputString(doc);

// Construct the request headers
setupHeaders(theConnection, renderedDoc.length());

// Send the request
OutputStream output = theConnection.getOutputStream();
out.output(doc, output);

I don't have access to the server on the other end of that connection, and 
the connection is encrypted, so I can't just put in a proxy server to 
capture the stream to see what's really being sent.

One more data point, which may or may not be important. I have to use the 
Beta-7 version of JDOM, because it's distributed as part of my app server, 
and putting jdom 1.0 earlier in the classpath causes the app server to 
choke. 

Many, many thanks for any help.

-Chris

On 5/20/05, Jason Hunter <jhunter at xquery.com> wrote:
> 
> You're not actually outputting the file to a byte stream. You're
> outputting it to a String, then printing the string using
> System.out.println(). System.out is a PrintStream and per the
> PrintStream Javadocs, "All characters printed by a PrintStream are
> converted into bytes using the platform's default character encoding."
> 
> Try this: out.output(doc, System.out);
> 
> That way JDOM gets to control the bytes being output.
> 
> -jh-
> 
> Chris Curvey wrote:
> 
> > Hi all,
> >
> > I'm having a little trouble figuring out utf-8 encoding with JDom. The
> > output from this sample program is returning a single hex value, \xc9
> > for an E-acute, but according to this page
> > http://www.fileformat.info/info/unicode/char/00c9/index.htm, the UTF-8
> > encoding for E-acute should be a hex pair \xc3 and \x89. (\xc9 appears
> > to be right value for UTF-16.)
> >
> > Any idea what I'm doing wrong? Or am I just misinterpreting something?
> >
> > import org.jdom.Document;
> > import org.jdom.Element;
> > import org.jdom.output.XMLOutputter;
> > import org.jdom.output.Format;
> >
> > class JdomTest
> > {
> > public static void main (String[] argv)
> > {
> > Document doc = new Document();
> > Element element = new Element("foobar");
> > element.setText("CLOISONNÉ");
> > doc.addContent(element);
> >
> > Format format = Format.getPrettyFormat();
> > format.setEncoding("UTF-8");
> > XMLOutputter out = new XMLOutputter(format);
> > System.out.println(out.outputString(doc));
> > }
> > }
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20050520/53bad152/attachment.htm


More information about the jdom-interest mailing list