[jdom-interest] Bug in CDATA reading/writing?

Mark Roder roder at is.com
Thu Mar 15 07:06:59 PST 2001


 
I am still a bit confused by this.  I use JDOM, I don't know all the ins and
outs XML.  This is why I like JDOM - it keeps thing simple! Damn great tool!

To me, CDATA is just a String. The spec kind of drives this as well
"Definition: CDATA sections may occur anywhere character data may occur;
they are used to escape blocks of text containing characters which would
otherwise be recognized as markup. "

So, since CDATA is "just" a string, why should
XMLOutputter.printCDATASection() be written so that it doesn't print the
true string, but can also optionally add spaces to the front and a newline
to end to actually change the value of the string?

To me having:
    protected void printCDATASection(...)
    {
        indent(out, indentLevel);
        out.write(cdata.getSerializedForm());
        maybePrintln(out);
    }
is just like having the code read like this - and this makes no sense to me.
   protected void printString(...)
   {
        indent(out, indentLevel);
        out.write(string);
        maybePrintln(out);
   }

Wouldn't the following changes "fix" this?   
    protected void printCDATASection(...)
    {
        out.write(cdata.getSerializedForm());
    }

    printElement(...) would need to be changed as well.  It currently keys
off a boolean of stringOnly to decide if it should call maybePrintln()
before it calls printElementContent().  This check should be changed to see
if the first element is a instance of String or CDATA.

Am I way off base here?  To me it looks like XMLOutputter is tainting data.

The system I am working on is doing Elements in memory and also Elements
from reading files generated by outside systems.  Most documents I am
sending to XMLOutputter have a mixed group of elements, so now I am getting
large files of few lines instead of hundreds+ lines.  I am concerned a
outside system is going to run into a line length issue.

Later

Mark

-----Original Message-----
From: Jason Hunter
To: Mark Roder
Cc: 'jdom-interest at jdom.org'
Sent: 3/14/01 3:28 PM
Subject: Re: [jdom-interest] Bug in CDATA reading/writing?

Mark Roder wrote:
> 
> I am using JDOM-B6.
> 
> I am noticing if I read a file, write a file and then read it back in
again,
> I get different data the second time I read it when the file contains
CDATA.
> 
> This seems weird to me.  Is this a bug in the code or something that
should
> be documented some more?  It was surprising to me.

It's working as spec'd.

>         //Breaks  things on next line
>         //XMLOutputter xmlOut = new XMLOutputter("",true);
>         //XMLOutputter xmlOut = new XMLOutputter("  ",true);

The "true" indicates to add new lines.  Thus it adds new lines.

>         //Breaks  adds the spaces to the front
>         //XMLOutputter xmlOut = new XMLOutputter("  ");
>         //XMLOutputter xmlOut = new XMLOutputter("  ",false);

The "  " indicates to add a two-space indent.  Thus it adds a two-space
indent.

>         //Works
>         //XMLOutputter xmlOut = new XMLOutputter("");
>         //XMLOutputter xmlOut = new XMLOutputter("",false);
>         XMLOutputter xmlOut = new XMLOutputter();

The "" and false say not to add indenting or new lines.  Thus your
output is the same as your input.

Since by default in XML all whitespace is preserved, you generally want
to output without adding any new whitespace.  But if you create elements
in-memory then you probably don't add padding whitespace so you want the
outputter to add the padding for you.  That's why you have that option.

-jh-




More information about the jdom-interest mailing list