[jdom-interest] Turning of entity expansion

Stephan Trebels stephan at ncube.de
Wed Sep 4 00:58:58 PDT 2002


Hi,

I think, the main confusion in this thread is what the charter of JDOM
is.  As I understand it,  JDM is about "Java-centric representation of
parsed XML".  Note the "parsed" here!


Parsing XML (entities, elements, ...) is done by the parsers, JDOM
merely maintains the results.  You can add any string to the contents
of an element and _this_ string is going to be in the _parsed_ XML.
Safety has to be applied by the XML outputter to make sure, the same
_parsed_ XML is recreated by another parser.

Therefore, '&', '<', '>' can not be output unescaped, because that
would violate the above principles.  If you add a '&' to the parsed
XML (via "&amp;" in the not-yet parsed XML or via Java), it will be
output as "&amp;" as this is the XML representation of a '&'.


But back to the point of the original email:

you asked about the character entity being expanded (which is correct
for the parsed XML).  I'd certainly expect this to be stored as
UNICODE internally.  You can expect, though, that if the right
encoding is used while driving the XML outputter, character entities
are generated for _all_ characters, that need it.  For ASCII, that
would be at the very least the '<', '>', '&', all 8bit, all multibyte,
...


Concerning inline XML:

You'd like to use something like 

void parseAndAddXML(String xmlFragment)

or

void addUnparsedXML(String xmlFragment)

But now you have to parse XML in the context of a JDOM element or
store the XML unparsed.  Storing unparsed XML would break any
guarantees that the generated XML can be parsed at all, nor can the
current JDOM tree be regenerated.  I'd not allow this in any code, I'm
responsible for.  Compare the code:

element.addUnparsedXML("<Appointment date='2002-03-03T02:00:00.000'/>");

element.addContent(new Element("Appointment")
	           .setAttribute("date", "2002-03-03T02:00:00.000"));

How many mistakes am I likely to make in the first try, which I cannot
possibly make in the second.  How do you want to validate things?
More problems than it's worth it.


Stephan


On Wed, Sep 04, 2002 at 07:54:26AM +0100, ion wrote:
> also there is the case where JDOM expands
> '<' into "&lt;" and '>' into "&gt;", this may not
> always be the desired action, especially if one
> wants to let the user include XML content, in
> this case one would not want the '<' and '>'
> altered in any way.
> 
> OK, it would be possible to parse the user
> input bit and replace the '<' and '>' pairs with
> Elements, but then... how does one implement
> inline elements in JDOM?
> 
> regards
> 
> Empty
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com

-- 
[------------ Stephan Trebels <stephan at ncube.de>, Consultant -----------]
company: nCUBE Deutschland GmbH, Hanauer Str. 56, 80992 Munich, Germany
phone: cell:+49 172 8433111  office:+49 89 149893 0  fax:+49 89 149893 50



More information about the jdom-interest mailing list