[jdom-interest] jdom 1.0 XMLOutputter

Bradley S. Huffman hip at cs.okstate.edu
Wed Sep 22 13:21:42 PDT 2004


Try this

import org.jdom.*;
import org.jdom.output.*;
import org.jdom.input.*;
import java.io.*;

public class JdomTest3
{
  public static String[] testDocs = new String[]
  {
    "<r>aaa\nbbb</r>", // a single \n. nothing should happen
    "<r>aaa\r\nbbb</r>", // a sequence of \r\n, nothing should happen
    "<r>aaa\rbbb</r>" // a single \r, should be replaced with \n
  };

  public static void main(String[] args)
  {
    SAXBuilder builder = new SAXBuilder();
    try
    {
      int n = testDocs.length;
      for(int i = 0; i < n; i++)
      {
        ByteArrayInputStream stream = new ByteArrayInputStream(testDocs[i].getBytes());
        Document jdoc = builder.build(stream);
        Element root = jdoc.getRootElement();
        Content c = root.getContent(0); // Should be a Text node
        System.out.println(c.toString());
      }
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }

  }
}

and notice all 3 testDocs produce the same output, "aaa\nbbb".  This happens
because the parser, not jdom, normalizes the end of line character to a 
single \n before passing it to the application.  There is no way to tell
if the output "aaa\nbbb" came from testDocs[0], testDocs[1], or testDocs[2], 
this is what the spec says the parser should do, and there is no way for JDOM
or any other application to know how to serialize it back to it's original form.

Now for Content c = root.getContent(0) to return a Text node with a \r in
it, then the original document must of had something like "<r>aaa&#A;bbb</r>".
Otherwise, as above, the \r would have of been normalized to a single \n.
Therefore on serialization all \r in text content need to be escaped as
char. reference. It couldn't be there unless it was a char. reference in
the original document.

Change to

  public static String[] testDocs = new String[]
  {
    "<r>aaa&#A;bbb</r>", // a single \n. nothing should happen
    "<r>aaa&#D;&#A;nbbb</r>", // a sequence of \r\n, nothing should happen
    "<r>aaa&#D;bbb</r>" // a single \r, should be replaced with \n
  };

and see what it produces.

Brad


More information about the jdom-interest mailing list