[jdom-interest] More Verification

Elliotte Rusty Harold elharo at metalab.unc.edu
Mon Aug 21 17:17:03 PDT 2000


I've written versions of the

Verifier
Element
Attribute
ProcessingInstruction
Comment

classes that check the contents of these items, not just the names. 
In particular, they check every character of the contents to make 
sure it's legal parsed character data and is not, for example, a C0 
control character like NUL or form feed.

These are based on the CVS tree from about a week and a half because 
I was offline last week while I was in Montreal. The changes 
shouldn't be too hard to merge in though. For example, here's the new 
Attribute constructor:

     public Attribute(String name, String value, Namespace namespace) {
         String reason;
         if ((reason = Verifier.checkAttributeName(name)) != null) {
             throw new IllegalNameException(name, "attribute", reason);
         }

             reason = Verifier.checkPCDATA(value);
         if (reason != null) {
             throw new IllegalDataException(value, reason);
         }

         this.value = value;


         if (namespace == null) {
             namespace = Namespace.NO_NAMESPACE;
         }

         this.name = name;
         this.value = value;
         this.namespace = namespace;
     }


The key is this new method in Verifier:

     /**
      * <p>
      *  This will ensure that a string is non-null
      *  and contains only legal Unicode characters
      *  allowed by the XML 1.0 specification. For example,
      *  most C0 controls like the vertical tab, formfeed, and bell, are
      *  forbidden.
      * </p>
      *
      * @param data <code>String</code> data to check.
      * @return <code>String</code> - reason data is invalid, or
      *         <code>null</code> is name is OK.
      */
     public static final String checkPCDATA(String data) {
         if (data == null) {
             return "PCDATA cannot be null";
         }

         for (int i = 0; i < data.length(); i++) {
             char c = data.charAt(i);
             if (!isXMLCharacter(c)) {
                 return "Illegal Character 0x" + Integer.toHexString(c) +
                   "at position " + i + " in String " + data;
             }
         }


         // If we got here, everything is OK
         return null;
     }

Everything's at http://metalab.unc.edu/xml/jdom/
I'll try to merge these into the latest tree soon, if nobody else 
gets to it first. I'm not sure what changes have been made in the 
last week. Then I want to take a look at the possibility of 
preserving namespace prefixes.

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list