[jdom-interest] Entity Resolver Cache/Catalog

Paul Libbrecht paul at activemath.org
Mon Aug 29 08:15:25 PDT 2011


Le 29 août 2011 à 16:58, Rolf Lear a écrit :

> This is further compounded by there being some restrictions on some
> documents too, like the w3.org 'ban' on default Java user-agents:
> http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/
> 
> My experimentation indicates that w3.org has put a blanket 'tarpit' of 30
> seconds on any connection, regardless of what User-agent you use. This is
> 'significant'.

Definitely, W3C wants you to stop reference DTDs by their URL-URIs.
Well... it wants the parsers to stop keep parsing them.

> Typical solutions to this problem are things like OASIS catalogs, etc. but
> that feels heavy-weight... or, is it?

I believe it was not very hard to configure the java-shipped Xerces with catalogs.
And I would encourage the JDOM code to encourage this  by showing good practice.

Here's what I used before SAXparsing:

>            SAXParserFactory factory = SAXParserFactory.newInstance();
>             System.setProperty("com.sun.org.apache.xerces.xni.parser.XMLParserConfiguration",
>                     "com.sun.org.apache.xerces.parsers.XMLGrammarCachingConfiguration");
>             SAXParser parser = factory.newSAXParser();
> 
>             XMLCatalogResolver resolver = new XMLCatalogResolver();
>             resolver.setPreferPublic(true);
>             resolver.setCatalogList(new String[]{this.getClass().getResource("xmlCatalog.xml").toExternalForm()});
>             handler = new EventDeserializerSAXHandler(resolver);
>             if(LOG.isDebugEnabled()) LOG.debug("Starting parser.");
>             parser.parse(inputStream, handler);        

Caching, however, is for free with a single system-property (within the vm lifecycle) if I remember well.

It would be cool to have SAXBuilder.setCatalog to make JDOM a good citizen!
(or even better: SAXBuilder.addCatalogEntry(public, URL) with a javadoc example where the URL is using class.getResource().

paul
also often developing in train ;-)




More information about the jdom-interest mailing list