[jdom-interest] PATCH: Whitespace in Element

Elliotte Rusty Harold elharo at metalab.unc.edu
Sat Jun 16 08:59:49 PDT 2001


At 8:14 AM -0700 6/16/01, guru at stinky.com wrote:
>Currently, Element.getTextNormalize only considers the following
>characters to be whitespace:
>  space, tab, \n, \r
>
>This breaks the Unicode and Java definitions of whitespace.  I'm not
>sure if it breaks XML's.
>

It doesn't. XML defines white space in Production 3 as

S ::=    (#x20 | #x9 | #xD | #xA)+

Several of the characters you cite such as form feed and file 
separator are not allowed in XML documents at all, not in element 
names, not in character data, not in CDATA section, not in comments, 
nowhere.

I agree that XML and Unicode and Java are not consistent here. I 
think JDOM needs to fall on the XML side of the fence. Consequently I 
suggest rejecting this change.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list