[jdom-interest] Yet another TODO (entity escapes)

guru at stinky.com guru at stinky.com
Tue Jun 19 10:12:18 PDT 2001


* Figure out how to deal with XMLOutputter writing of special characters like
  &#160.  Should it char escape only chars unprintable in the current
  character set?  Or should there be a fancy API for selecting what's escaped?
  http://lists.denveronline.net/lists/jdom-interest/2001-February/004521.html

It seems to me that this is a parser issue.  If XMLOutputter is passed
an EntityRef containing the special character code, it outputs it as
an entity reference.  

	    ( new Element("funky")
	      .addContent( new EntityRef("#x2022") )
	      .addContent("Bullet")
	      .addContent(" ")
	      .addContent( new EntityRef("#160") )

outputs <funky>&#x2022;Bullet one &#160;</funky> as expected.

However, the SAX parser expands &#xxx; entities into their unicode
string versions, *even when you call setExpandEntities(false)*.
Sounds like either a bug or a design flaw in SAX parsers, or in the
SAXBuilder (which I haven't looked closely at).  Shouldn't they return
EntityRef objects?

XMLOutputter is faithfully outputting what it was given; if it's a
high Unicode value inside a Java String, then Java takes care of
converting it to the right bytes for the stream's output encoding.

OTOH, if someone wants to make sure that all unicode characters turn
into their corresponding escapes on the way out, or vice versa, then
that's a good use for a filter stream.

Either way, I think we can check this one off the todo list, at least
for XMLOutputter.

-- 
Alex Chaffee                       mailto:alex at jguru.com
jGuru - Java News and FAQs         http://www.jguru.com/alex/
Creator of Gamelan                 http://www.gamelan.com/
Founder of Purple Technology       http://www.purpletech.com/
Curator of Stinky Art Collective   http://www.stinky.com/



More information about the jdom-interest mailing list