[jdom-interest] A suggested performance improvement

Tom Oke tomo at elluminate.com
Sun Mar 16 17:18:44 PST 2003


I have noticed, on large XML files, that the majority of the CPU time
is going into the routines: Verifier.isXMLCharacter and 
Verifier.checkCharacterData.

I had initially modified isXMLCharacter to have it check the most
likely range of data first, to get a short exit, and this took off
about 25% of the CPU used in some large files, for the JDOM read.

However, in the thread doing the JDOM input, 62% of the time
was still in isXMLCharacter and 16% was in checkCharacterData,
which calls isXMLCharacter.

The biggest bang for the buck was by enclosing the 
if statement with isXMLCharacter with a test for the 
most likely good range. This is seen below in the two
lines:

            char c = text.charAt(i);
            if (!(c > 0x1F && c < 0xD800)) {

This reduced checkCharacterData to 1.32% of the thread use,
and isXMLCharacter doesn't really show up at all.

Hopefully this is a reasonable change to submit to JDOM?

What follows is the full code for Verifier.checkCharacterData.



    public static final String checkCharacterData(String text) {
        if (text == null) {
            return "A null is not a legal XML value";
        }

        // do check
        for (int i = 0, len = text.length(); i<len; i++) {
            char c = text.charAt(i);
            if (!(c > 0x1F && c < 0xD800)) {
                if (!isXMLCharacter(text.charAt(i))) {
                    // Likely this character can't be easily displayed
                    // because it's a control so we use it'd hexadecimal
                    // representation in the reason.
                    return ("0x" + Integer.toHexString(text.charAt(i))
                            + " is not a legal XML character");
                }
            }
        }

        // If we got here, everything is OK
        return null;
    }

Tom Oke



More information about the jdom-interest mailing list