[jdom-interest] SAXbuilder and escape sequences

Luke Majewski luke.majewski+jdom at gmail.com
Wed Oct 12 07:37:23 PDT 2005


Hi all,

I have scoured the web for a solution to this and I am stumped. I have an
xml file with elements like:

<pr type="US">&stress1;r&aelig;bit </pr>

When reading this in through the SAXbuilder, I get question marks and
strange characters instead of the actual text.

Here is the code I am currently using, I figured it was an issue of encoding
but it's not doing the trick:


SAXBuilder sb = new SAXBuilder("org.apache.crimson.parser.XMLReaderImpl");

InputSource is = new InputSource("file:///d:/workspace/OACD/OACD_rz.xml");
is.setEncoding("UTF-8");
sb.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId) throws
SAXException, IOException {
return new InputSource("file:///d:/workspace/oup-character-entities.ent");
}
});
document = sb.build(is);

and the xml header is:

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href="http://somestyle.xsl"?>
<!DOCTYPE dictionary SYSTEM "dictionary.dtd">
<dictionary xml:space='preserve'>

What I get back when I do a getText() on the element pr is "?r?bit"

I assume I am missing something obvious, pointing me to the right section of
the documentation would be sufficient.

Thank you,

Luke Majewski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20051012/c59d2e57/attachment.htm


More information about the jdom-interest mailing list