[jdom-interest] Turning of entity expansion

Paul Chapman chapman at zemsys.com
Wed Sep 4 00:45:46 PDT 2002


Firstly the Text object is trying to be helpful by allowing you
to insert _any_ text content without it being misinterpretted as
XML directives. This if you wanted to output:

   <test> x < y </test>

It helpfully makes it valid XML:
   <text> x &lt; y </test>

Ditto for the & and the > characters.

The question that now arises is: why you want to dump predefined XML
content into your output? Where is it coming from? Could you perhaps
build this into JDOM too?

For example: you could build JDOM from that text and then insert your
mini-JDOM tree into your overall structure.

Is this any help?

-Paul.

ion wrote:

>>OK, so JDom has helpfully converted a character (the &) that could
>>be confused with an XML reserved character(<, >...) into &amp; for you.
>>This is normally what you would want, so I doubt it can be turned off.
>>
>>JDOM does not know that &#169; is already encoded for XML, so it tries
>>to do it for you.
>>
>>This comes back to your original comment:
>>
>> > >When I look at the output my Unicode reference has been
>> > >changed into the actual character, which I do not want, I want
>> > >this line to be output verbatim.
>>
>>So, why is the actual character not acceptible? I am not saying you
>>are right or wrong to want the original character, I am trying to
>>ascertain the reason why the translated character is not acceptible
>>to you. The copyright symbol appears quite happily in my browser
>>when I use it. Like this: ©
>>
> 
> certain xml validators complained, but i guess this is no biggie, I will
> use it like that from now on, but more importantly I have now discovered
> the source of my problem, not being nable to output '&' as '&' and not
> "&amp;", '<' as '<' instead of "&lt;" and '>' as '>' instead of "&gt;".
> 
> Maybe I am just ignorant of the correct way to
> do this, how would one go about inserting inline elements?
> 
> Would it not be easier to be able to allow the verbatim insertion of text
> sections so that one could more efficiently include sections of XML that
> one knows to be valid?
> 
> I would have thought that perhaps the Text object could have provided
> this functionality?
> 
> Regards
> 
> Empty
> 
>>-Paul.
>>
>>ion wrote:
>>
>>
>>>Here is an example, consider the following simple program:
>>>
>>>import java.io.*; import org.jdom.*;
>>>import org.jdom.input.*; import org.jdom.output.*;
>>>public class test {
>>>   public static void main(String args[]) {
>>>      Document doc = new Document(new Element("html"));
>>>      DocType docType = new DocType("html", "-//W3C//DTD XHTML 1.0
>>>Transitional//EN",
>>>
>>>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
>>>      doc.setDocType(docType);
>>>      Element root = doc.getRootElement();
>>>      Element head = new Element("head");
>>>      head.addContent(new Element("title").setText("Blah"));
>>>      root.addContent(head);
>>>      Element body = new Element("body");
>>>      body.addContent(new Element("p").setText("&quot; &#169; blah
>>>
> blah"));
> 
>>>      root.addContent(body);
>>>      String newItem = args[0];
>>>      XMLOutputter outputter = new XMLOutputter("  ", true);
>>>      outputter.setTextNormalize(false);
>>>      try {
>>>         outputter.output(doc, new FileWriter((newItem+".html")));
>>>      } catch(Exception e) { System.err.println(e.getMessage());}
>>>   }
>>>}
>>>
>>>(I apologise for the crapness of it, I quickly created it)
>>>Which is some program, that could perhaps be used to output
>>>templates for some html page, or more realistically include input
>>>from some other XML file. Executing this like this:
>>>
>>>java test test
>>>
>>>produces the output file test.html:
>>>
>>><?xml version="1.0" encoding="UTF-8"?>
>>><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>>>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>>><html>
>>>  <head>
>>>    <title>Blah</title>
>>>  </head>
>>>  <body>
>>>    <p>&amp;quot; &amp;#169; blah blah</p>
>>>  </body>
>>></html>
>>>
>>>>JDOM at all.  They are expanded by the parser before JDOM ever "sees"
>>>>
>>>>
>>>them.  So
>>>
>>>
>>>>to keep your original character entities intact, you would have to
>>>>
> address
> 
>>>this
>>>
>>>
>>>>(in some way that I can't answer) by tweaking the parser you use.
>>>>
>>>>
>>>Ok, so it was my parser expanding the character entities but...
>>>
>>>Amphersands have been expanded to "&amp;", why is this?
>>>
>>>--SNIP--
>>>
>>>
>>>>>That's overstating it a bit, no? He's asking for a particular one of
>>>>>
> two
> 
>>>>>forms that are completely equivalent in XML's eyes, right?
>>>>>
>>>>>
>>>--SNIP--
>>>
>>>This is a very good point, if they ARE equivalent then there should be
>>>
> the
> 
>>>option to output either form.
>>>
>>>--SNIP--
>>>
>>>
>>>>>misunderstanding of XML. But there are certainly reasonable cases where
>>>>>something else might care, and you might want to have control over this
>>>>>(irrespective of this particular case).
>>>>>
>>>>>
>>>--SNIP--
>>>
>>>Definately. But it seems as though the only case is the amphersand.  Is
>>>
> this
> 
>>>right?
>>>
>>>How can I output an amphersand verbatim?
>>>
>>>Regards
>>>
>>>Empty
>>>
>>>_______________________________________________
>>>To control your jdom-interest membership:
>>>
>>>
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
> t.com
> 
>>>
>>
>>--
>>
>>Paul Chapman
>>
>>Email:  chapman at zemsys.com
>>Mobile: +61 418 340 935
>>
>>
> 
> 


-- 

Paul Chapman

Email:  chapman at zemsys.com
Mobile: +61 418 340 935




More information about the jdom-interest mailing list