[jdom-interest] CDATA content is not preserved

Andreas Schaefer aschaefer at SeeBeyond.com
Wed Nov 24 15:47:28 PST 2004


Hi Geeks

I just stumbled over a problem with CDATA that does not preserve the
content of the given text when read from a file. I used the XMLOutputter
to write to a file and then the SAXBuilder to read from the file. All
the values are embedded into a CDATA content tag and then added to an
element.

This is the text I try to write on a Windows system:

            "testSingleRecordAndDropEol(), I am your 2." +
mLineSeparator + "(testSingleRecordAndDropEol()) test message"

whereas the line separator is taken from the System Properties.

That is the content of the CDATA tag (read from the file) which is
correct with respect to the text above (in quotes are the characters of
the string, in brackets the number of the character), please pay
attention to the third line (I added a '>' and '<' around it):

't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18) 'n' (23) 'g' (16)
'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24) 'r' (27) 'd' (13)
'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24) 'p' (25) 'E' (14)
'o' (24) 'l' (21) '(' (-1) ')' (-1) ',' (-1) ' ' (-1) 'I' (18) ' ' (-1)
'a' (10) 'm' (22) ' ' (-1) 'y' (34) 'o' (24) 'u' (30) 'r' (27) ' ' (-1)
'2' (2) '.' (-1) '
>' (-1) '<
' (-1) '(' (-1) 't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18)
'n' (23) 'g' (16) 'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24)
'r' (27) 'd' (13) 'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24)
'p' (25) 'E' (14) 'o' (24) 'l' (21) '(' (-1) ')' (-1) ')' (-1) ' ' (-1)
't' (29) 'e' (14) 's' (28) 't' (29) ' ' (-1) 'm' (22) 'e' (14) 's' (28)
's' (28) 'a' (10) 'g' (16) 'e' (14)

This is what I get back from the CDATA element after the file is read by
the SAXBuilder:

't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18) 'n' (23) 'g' (16)
'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24) 'r' (27) 'd' (13)
'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24) 'p' (25) 'E' (14)
'o' (24) 'l' (21) '(' (-1) ')' (-1) ',' (-1) ' ' (-1) 'I' (18) ' ' (-1)
'a' (10) 'm' (22) ' ' (-1) 'y' (34) 'o' (24) 'u' (30) 'r' (27) ' ' (-1)
'2' (2) '.' (-1) '
' (-1) '(' (-1) 't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18)
'n' (23) 'g' (16) 'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24)
'r' (27) 'd' (13) 'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24)
'p' (25) 'E' (14) 'o' (24) 'l' (21) '(' (-1) ')' (-1) ')' (-1) ' ' (-1)
't' (29) 'e' (14) 's' (28) 't' (29) ' ' (-1) 'm' (22) 'e' (14) 's' (28)
's' (28) 'a' (10) 'g' (16) 'e' (14) |#]

As you can see the third line is missing when read back from the
SAXBuilder. I guess the CDATA element does swallow either the line feed
or carriage return. Because I am using an XML file to transfer test data
back from a server I need preserve the content exactly and cannot afford
to lose a character.

Or am I mistaken by thinking that CDATA does preserve the content giving
to it?

Have a nice Turkey Day (Goobble-Goobble)
Andreas Schaefer
Senior Software Engineer

Upcoming Maven Presentation @ LA-JUG 12/7/04





More information about the jdom-interest mailing list