[jdom-interest] Need to optionally cancel automatic
escaping
Alex Rosen
arosen at novell.com
Thu Jul 17 06:52:35 PDT 2003
Sounds fine to me.
I didn't understand #3 though.
Alex
>>> "Bradley S. Huffman" <hip at cs.okstate.edu> 7/11/2003 5:23:33 PM >>>
Perfect timing. A while back James Clark posted on the xml-dev mailing
list.
If your infoset contains a carriage return, you have to output
it as a numeric character reference, otherwise line-end
normalization will turn it into a line-feed. Similarly, if
attribute values in the infoset contain line-feeds or tabs, they
need to be output as numeric character references, otherwise
attribute value normalization will turn them into spaces...When
I'm creating XML, some parts of what I am creating may well have
come from parsing an XML document. That means if there's any
XML infoset that my program cannot serialize correctly, it's
potentially a bug.
To which Elliotte Rusty Harold asked on his XOM mail-list (XOM's
Serializer
and JDOM's XMLOutputter are similar so issues affecting one usually
affect
the other).
I don't think the XOM serializer bothers to escape such carriage
returns, line feeds, tabs and the like where Clark suggests it
should. Should it? Or should this at least be an option in the
Serializer? And if it is an option, should it be the default
option?
Thoughts?
Which lead to a two day thread about what, if anything, should be done
about
carriage returns, line feeds, and tabs in attribute values and text
content.
To which John Cowan came up with the following algorithm.
In that case, the default mode should:
1) Escape all \r characters;
2) Escape \t and \n characters in attribute values;
3) Output \n characters in character content as the line
terminator;
4) Escape all non-encodable characters;
5) Encode everything else.
Doing anything else will not preserve the infoset through a round
trip.
#1-#3 would be fairly easy to do in XMLOutputer since we already escape
& and
>. #4 and #5 I think are already handled by the default escape
strategy, but
I haven't looked deep enough to give a definitive answer. This would
provide
for roundtripping by default in the two cases of
text -> SAXBuilder -> JDOM tree -> XMLOutputter -> text
JDOM tree -> XMLOutputter -> text -> SAXBuilder -> JDOM tree
which currently JDOM doesn't do.
Thoughts?
Brad
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
More information about the jdom-interest
mailing list