[jdom-interest] Improving performance of SAX parser configuration

Jason Hunter jhunter at servlets.com
Thu May 7 18:34:15 PDT 2009


Hi Scott,

Thanks for sending in what looks like a really good improvement!  I  
plan to add this to the codebase for the next release.  If anyone has  
issues, speak up now.

-jh-

On May 7, 2009, at 10:44 AM, Scott Emmons wrote:

> Greetings jdom-interest,
>
> I've run across an interesting performance issue in the way JDOM
> handles Xerces parser configuration even when reuseParser is enabled
> in SAXBuilder, and I wanted to run this by the list - not only for
> validation, but hopefully something along the lines of this
> improvement can get rolled in (yep, I know JDOM is in maintenance
> mode).
>
> For a bit of background, the particular case we have involves parsing
> lots and lots of little XML document fragments via SAXBuilder.build()
> - not terribly efficient, but for what we use JDOM for it's a
> pre-existing condition that we're stuck with.
>
> What I found is that more time was spent in configureParser() than in
> actually parsing the XML. The reason for this is attempting to set
> options on the parser which don't exist in Xerces - or at least the
> version of it we are using. This results in
> SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does
> ResourceBundle lookups each time. While we do set reuseParser, each
> execution of build() still reconfigures the underlying parser.
>
> I know that the contentHandler, and perhaps other options are not
> reusable, and this doesn't change the semantics of that. Since the
> underlying parser is unlikely to suddenly start supporting some option
> it didn't used to, it's possible to remember whether or not the
> underlying parser implementation was able to support a property, and
> skip attempting to configuring it if not. I wired this as a specific
> option only used with reuseParser to be safe, but it's possible this
> could be done in a more generic manner that would benefit other
> codepaths and usages as well (it would simply my patch somewhat, but I
> wanted to be safe since there may be other consequences of this which
> I've overlooked).
>
> Again, I wouldn't expect this to help cases where larger XML is
> handled less frequently, but for my case where it's hundreds of XML
> fragments per transaction per second, this fix reduces the execution
> time of SAXBuilder.build() by about 1/2.
>
> I would love to hear any feedback as well as find out if anyone else
> has the same sort of performance improvements I've seen with this
> patch in cases where lots of small documents are parsed.
>
> Thanks for your time,
> -Scott
>
> ===CUT HERE===
> diff --git a/src/java/org/jdom/input/SAXBuilder.java
> b/src/java/org/jdom/input/SAXBuilder.java
> index 09fbb00..1627345 100644
> --- a/src/java/org/jdom/input/SAXBuilder.java
> +++ b/src/java/org/jdom/input/SAXBuilder.java
> @@ -134,6 +134,15 @@ public class SAXBuilder {
>     /** User-specified properties to be set on the SAX parser */
>     private HashMap properties = new HashMap(5);
>
> +    /** Whether to use fast parser reconfiguration */
> +    private boolean fastReconfigure = false;
> +
> +    /** Whether to try lexical reporting in fast parser  
> reconfiguration */
> +    private boolean tryLexicalReportingConfig = true;
> +
> +    /** Whether to to try entity expansion in fast parser  
> reconfiguration */
> +    private boolean tryEntityExpandConfig = true;
> +
>     /**
>      * Whether parser reuse is allowed.
>      * <p>Default: <code>true</code></p>
> @@ -396,6 +405,25 @@ public class SAXBuilder {
>     }
>
>     /**
> +     * Specifies whether this builder will do fast reconfiguration  
> of the
> +     * underlying SAX parser when reuseParser is true. This improves
> +     * performance in cases where SAXBuilders are reused and lots  
> of small
> +     * documents are frequently parsed. This avoids attempting to  
> set features
> +     * on the SAX parser each time build() is called which result in
> +     * SaxNotRecognizedExceptions. This should ONLY be set for  
> builders where
> +     * this specific case is an issue. The default value of this  
> setting is
> +     * <code>false</code> (no fast reconfiguration). If reuseParser  
> is false,
> +     * calling this has no effect.
> +     *
> +     * @param reuseParser Whether to reuse the SAX parser.
> +     */
> +    public void setFastReconfigure(boolean fastReconfigure) {
> +        if (this.reuseParser) {
> +            this.fastReconfigure = fastReconfigure;
> +        }
> +    }
> +
> +    /**
>      * This sets a feature on the SAX parser. See the SAX  
> documentation for
>      * </p>
>      * <p>
> @@ -657,42 +685,76 @@ public class SAXBuilder {
>              parser.setErrorHandler(new BuilderErrorHandler());
>         }
>
> -        // Setup lexical reporting.
> -        boolean lexicalReporting = false;
> -        try {
> -            parser.setProperty("http://xml.org/sax/handlers/LexicalHandler 
> ",
> -                               contentHandler);
> -            lexicalReporting = true;
> -        } catch (SAXNotSupportedException e) {
> -            // No lexical reporting available
> -        } catch (SAXNotRecognizedException e) {
> -            // No lexical reporting available
> -        }
> +        /* If fastReconfigure is enabled and we failed in the  
> previous attempt
> +         * in configuring lexical reporting, then skip this step.
> +         */
> +        if (tryLexicalReportingConfig) {
> +            boolean configured = true;
>
> -        // Some parsers use alternate property for lexical handling  
> (grr...)
> -        if (!lexicalReporting) {
> +            // Setup lexical reporting.
> +            boolean lexicalReporting = false;
>             try {
> -                parser.setProperty(
> -                    "http://xml.org/sax/properties/lexical-handler",
> -                    contentHandler);
> +
> parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
> +                                   contentHandler);
>                 lexicalReporting = true;
>             } catch (SAXNotSupportedException e) {
>                 // No lexical reporting available
> +                configured = false;
>             } catch (SAXNotRecognizedException e) {
>                 // No lexical reporting available
> +                configured = false;
> +            }
> +
> +            // Some parsers use alternate property for lexical
> handling (grr...)
> +            if (!lexicalReporting) {
> +                try {
> +                    parser.setProperty(
> +                        "http://xml.org/sax/properties/lexical-handler 
> ",
> +                        contentHandler);
> +                    lexicalReporting = true;
> +                } catch (SAXNotSupportedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                } catch (SAXNotRecognizedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                }
> +            }
> +
> +            /* If unable to configure this property and  
> fastReconfigure is
> +             * enabled, then setup to avoid this code path entirely  
> next time.
> +             */
> +            if (!configured && fastReconfigure) {
> +                tryLexicalReportingConfig=false;
>             }
>         }
>
> -        // Try setting the DeclHandler if entity expansion is off
> -        if (!expand) {
> -            try {
> -                parser.setProperty(
> -                    "http://xml.org/sax/properties/declaration-handler 
> ",
> -                    contentHandler);
> -            } catch (SAXNotSupportedException e) {
> -                // No lexical reporting available
> -            } catch (SAXNotRecognizedException e) {
> -                // No lexical reporting available
> +        /* If fastReconfigure is enabled and we failed in the  
> previous attempt
> +         * in configuring entity expansion, then skip this step.
> +         */
> +        if (tryEntityExpandConfig) {
> +            boolean configured = true;
> +
> +            // Try setting the DeclHandler if entity expansion is off
> +            if (!expand) {
> +                try {
> +                    parser.setProperty(
> +                        "http://xml.org/sax/properties/declaration-handler 
> ",
> +                        contentHandler);
> +                } catch (SAXNotSupportedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                } catch (SAXNotRecognizedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                }
> +            }
> +
> +            /* If unable to configure this property and  
> fastReconfigure is
> +             * enabled, then setup to avoid this code path entirely  
> next time.
> +             */
> +            if (!configured && fastReconfigure) {
> +                tryEntityExpandConfig=false;
>             }
>         }
>     }
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/ 
> youraddr at yourhost.com



More information about the jdom-interest mailing list