[jdom-interest] Fwd: Formatting differences after migrating to JDOM2

Rolf jdom at tuis.net
Sun Oct 6 13:05:59 PDT 2013


Hi Robert.

OK. I have spent some time going through things, and, admittedly, this 
is confusing, and working through the combinations/permutations for 
formatting is liable to end in a headache.

So, I think I have resolved that there are a number of issues at hand in 
your case:
1. JDOM2 is doing different things than JDOM1
2. JDOM1 is probably doing the wrong thing in this case
3. JDOM2 is also probably doing the wrong thing, but, in fairness, 
changing the 'TextMode' of a PrettyPrint format is a 'dangerous' thing 
.... not by design, but because of the actual implementation and choices 
the formatter makes with the pretty format.
4. If whitespace is significant for certain members of an XML document 
then you should not be relying on the whim of JDOM to make things right, 
but you should be using the xml:space="preserve" mechanism that is 
designed for this purpose.

So, here are a few 'answers'.

Answer 0:
=====================================================
The output you are getting from JDOM 1.x is broken. If you have a 
'preserve' text mode then there should be no whitespace between any 
elements (indenting/newlines) because that is not 'preserved' space 
(it's 'invented' whitespace).

The JDOM output you currently get is relying on a bug in JDOM 1.x

Answer 1:
=====================================================
The "right" thing for you to do is to add the xml:space="preserve" to 
the sub2 elements:

     public static void main(String argv[]) throws Exception{
         Document document = new Document();
*Attribute cloneme = new Attribute("space", "preserve", 
Namespace.XML_NAMESPACE);*
         Element root = new Element("root");
         document.addContent(root);
         Element sub1 = new Element("sub1");
         root.addContent(sub1);
         sub1.addContent(new Element("sub2").setText("Some 
text")*.setAttribute(cloneme.clone())*);
         sub1.addContent(new Element("sub2").setText("  text with left 
and right whitespace  ")*.setAttribute(cloneme.clone())*);
         Format fmt = Format.getPrettyFormat();
         XMLOutputter xout = new XMLOutputter(fmt);
         xout.output(document, System.out);
     }

Gives the output:

<root>
   <sub1>
     <sub2 xml:space="preserve">Some text</sub2>
     <sub2 xml:space="preserve">  text with left and right whitespace  
</sub2>
   </sub1>
</root>

Answer 2:
=====================================================
The "OK" thing for you to do is to use the TextMode.TRIM_FULL_WHITE 
instead of TextMode.PRESERVE... the default TextMode for PrettyPrint is 
TextMode.TRIM, which removes white-space from either-end of the text, 
but the TRIM_FULL_WHITE will remove whitespace only when there's only 
whitespace, and will do nothing if there's any non-whitespace 
characters. I want you to be aware that other tools (JDOM, xmllint) have 
the right to mess with the whitespace ( 
http://www.w3.org/TR/REC-xml/#sec-white-space ). It is only by 
convention that the following will work in JDOM (I recommend preserving 
whitespace correctly with xml:space="preserve") :

     public static void main(String argv[]) throws Exception{
         Document document = new Document();
         Element root = new Element("root");
         document.addContent(root);
         Element sub1 = new Element("sub1");
         root.addContent(sub1);
         sub1.addContent(new Element("sub2").setText("Some text"));
         sub1.addContent(new Element("sub2").setText("  text with left 
and right whitespace  "));
         Format fmt = Format.getPrettyFormat();
         fmt.setTextMode(Format.TextMode.TRIM_FULL_WHITE);
         XMLOutputter xout = new XMLOutputter(fmt);
         xout.output(document, System.out);
     }

Gives the output:

<root>
   <sub1>
     <sub2>Some text</sub2>
     <sub2>  text with left and right whitespace </sub2>
   </sub1>
</root>

Answer 3:
=====================================================
JDOM 2.x uses a different (faster, and more flexible) algorithm for 
output handling. This algorithm has two major triggers: The TextMode and 
the Indent. PrettyPrint sets the TextMode to TRIM and the Indent to two 
spaces "  ". The TRIM mode tells JDOM it can mess with whitespace in 
Text. The INDENT tells JDOM it can mess with the formatting of the XML 
structure (setting it to null tells JDOM not to mess with any indenting).
You have been changing the TextMode to PRESERVE, and, as I think about 
that, JDOM should never mess with the indenting when the mode is 
PRESERVE. JDOM has code to make sure that it manages the INDENT and the 
TextMode correctly when they need to change internally, but you are 
basically setting an invalid situation by setting INDENT and PRESERVE at 
the same time. JDOM should handle that better.

But, the right thing to do, is when you set PRESERVE, JDOM2 should 
output the following:
<root><sub1><sub2>Some text</sub2><sub2>  text with left and right 
whitespace  </sub2></sub1></root>

So, I think there's a bug in JDOM2, and, given the input you have 
(Format.getPrettyFormat().setTextMode(TextMode.PRESERVE) ) It should be 
outputting the above (which is not what you want).

Answer 4:
=====================================================
You can use the Raw format, and output the spaces yourself by adding 
your own indenting and newlines.





On 06/10/2013 1:09 PM, Robert Krüger wrote:
> Hi Rolf,
>
> On Sat, Oct 5, 2013 at 2:08 AM, Rolf <jdom at tuis.net> wrote:
>> Hi Robert.
>>
>> Just so we are on the same page, when I run the code, I get the following
>> output:
>>
>> with the setTextMode(...):
>>          new
>> XMLOutputter(Format.getPrettyFormat().setTextMode(Format.TextMode.PRESERVE)).output(document,
>> System.out);
>> <?xml version="1.0" encoding="UTF-8"?>
>> <root>
>>    <sub1>
>>      <sub2>
>>        Some text
>>      </sub2><sub2>
>>
>>          text with left and right whitespace
>>      </sub2>
>>    </sub1>
>> </root>
>>
>>
>> without the setTextMode(...)
>>          new XMLOutputter(Format.getPrettyFormat()).output(document,
>> System.out);
>> <?xml version="1.0" encoding="UTF-8"?>
>> <root>
>>    <sub1>
>>      <sub2>Some text</sub2>
>>      <sub2>text with left and right whitespace</sub2>
>>    </sub1>
>> </root>
>>
>> The plain "Pretty" format is the way I think you want the output, and it is
>> right, right?
> Yes, except for whitespace being trimmed. I do not want that but want
> indenting and no whitespace trimming for text-only elements (that was
> the behaviour of JDOM1). The use case is that I use xml to store data
> (e.g. user input of a content management system) and removing
> whitespace modifies the data, which I do not want to happen but I do
> want indenting.
>
>> It is very unusual for someone ysing the PrettyFormat to modify the
>> TextMode.... I wonder why you have the setTextMode() at all...
> see above.
>
>> Rolf
> Robert
>
>>
>>
>> On 30/09/2013 9:43 AM, Robert Krüger wrote:
>>> forgot to reply to the list
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Robert Krüger <krueger at lesspain.de>
>>> Date: Mon, Sep 30, 2013 at 3:42 PM
>>> Subject: Re: [jdom-interest] Formatting differences after migrating to
>>> JDOM2
>>> To: Rolf <jdom at tuis.net>
>>>
>>>
>>> This reproduces the behaviour:
>>>
>>> import org.jdom2.Document;
>>> import org.jdom2.Element;
>>> import org.jdom2.output.Format;
>>> import org.jdom2.output.XMLOutputter;
>>>
>>> public class JDOMOutput {
>>>
>>>       public static void main(String argv[]) throws Exception{
>>>           Document document = new Document();
>>>           Element root = new Element("root");
>>>           document.addContent(root);
>>>           Element sub1 = new Element("sub1");
>>>           root.addContent(sub1);
>>>           sub1.addContent(new Element("sub2").setText("Some text"));
>>>           sub1.addContent(new Element("sub2").setText("  text with left
>>> and right whitespace  "));
>>>           new
>>> XMLOutputter(Format.getPrettyFormat().setTextMode(Format.TextMode.PRESERVE)).output(document,
>>> System.out);
>>>       }
>>>
>>> }
>>>
>>> Try with and without the setTextMode(Format.TextMode.PRESERVE). None
>>> of them does what I need.
>>>
>>> On Sun, Sep 29, 2013 at 7:10 PM, Robert Krüger <krueger at lesspain.de>
>>> wrote:
>>>> Hi,
>>>>
>>>> it is part of a large application. I will try to build a simple test
>>>> program that demonstrates the effect.
>>>>
>>>> Cheers,
>>>>
>>>> Robert
>>>>
>>>> On Sun, Sep 29, 2013 at 5:26 PM, Rolf <jdom at tuis.net> wrote:
>>>>> Hi Robert.
>>>>>
>>>>> This is surprising indeed, and I agree it should not be different from
>>>>> JDOM
>>>>> 1.x
>>>>>
>>>>> Can you get me a copy of the input file and the relevant parts of Java
>>>>> code?
>>>>> You don't need to CC the whole list it is large...
>>>>>
>>>>> Thanks
>>>>>
>>>>> Rolf
>>>>>
>>>>>
>>>>> On 29/09/2013 10:42 AM, Robert Krüger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I just migrated my code to from JDOM to JDOM2 and noticed some of our
>>>>>> unit tests failed. The reason is different formatting. I used
>>>>>> Format.getPrettyFormat().setTextMode(PRESERVE) for the formatting and
>>>>>> with jdom this produced output like the following
>>>>>>
>>>>>> <av-container format-version="0.3.4">
>>>>>>      <container-format>MP4</container-format>
>>>>>>      <bitrate>646448</bitrate>
>>>>>>      <duration>2002002</duration>
>>>>>>      <start-time>0</start-time>
>>>>>>      <acquisition-timestamp>1340887741000</acquisition-timestamp>
>>>>>>      <stream>
>>>>>>        <type>VIDEO</type>
>>>>>>        <codec>H.264</codec>
>>>>>> ...
>>>>>>
>>>>>> after replacing the imports by jdom2 I got
>>>>>>
>>>>>> <av-container format-version="0.3.4">
>>>>>>      <container-format>
>>>>>>        MP4
>>>>>>      </container-format><bitrate>
>>>>>>        646448
>>>>>>      </bitrate><duration>
>>>>>>        2002002
>>>>>>      </duration><start-time>
>>>>>>        0
>>>>>>      </start-time><acquisition-timestamp>
>>>>>>        1340887741000
>>>>>>      </acquisition-timestamp><stream>
>>>>>>        <type>
>>>>>>          VIDEO
>>>>>>        </type><codec>
>>>>>>          H.264
>>>>>>        </codec>...
>>>>>>
>>>>>> This looks rather broken as it does not preserve the original data at
>>>>>> all with all those added newlines. Removing the setTextMode(PRESERVE)
>>>>>> restored the format to what is shown above but the reason I added
>>>>>> setTextMode(PRESERVE) was that without it, whitespace was trimmed and
>>>>>> I do not want that for elements with text content.
>>>>>>
>>>>>> Is this a bug? How can I achieve what I want, i.e. have a "pretty",
>>>>>> i.e. indented format and have text-only elements preserve whitespace?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Robert
>>>>>> _______________________________________________
>>>>>> To control your jdom-interest membership:
>>>>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>>>>
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20131006/5c5ea810/attachment.html>


More information about the jdom-interest mailing list