[jdom-interest] Re: Substituting a different <!DOCTYPE ...> when parsing an XML file

Geoff Rimmer geoff.rimmer at sillyfish.com
Tue Jun 4 04:53:20 PDT 2002


"Sean Huo" <sqh at qad.com> writes:
>
> [Substituting a different <!DOCTYPE ...> when parsing an XML file]
>
> A better solution is to provide an EntityResovler.
>
> In your implementation of the EntityResovler, you have complete control
> over how you want to resolve the dtd reference.
>
> Here is a code fragment for parsing a xml document using JDOM.
>
> SAXBuilder builder = new SAXBuilder(true)l
> builder.setEntityResolver(new MyEnittyResolver());  // provide your own
> version of EntityResolver
> builder.build( ...);

As I understand it, using an EntityResolver for replacing a DOCTYPE is
only possible if you know what DOCTYPE you are looking for.  In other
words, if you know a document contains a DOCTYPE with a particular
system ID, you just create an EntityResolver which returns a
replacement DTD every time it matches this system ID.

But the problem I was referring in my original post was for the
following situations:

1. You are reading an XML document which contains a DOCTYPE, but you
   do *not* know what that DOCTYPE is.  In this case, the
   EntityResolver does not know what public/system IDs to look for,
   and so is unable to replace the DOCTYPE.

2. You are reading an XML document that does *not* contain a DOCTYPE
   at all.  In this case, builder.build() will throw an exception
   because it cannot perform validation if there is no DOCTYPE to
   validate against.

This is why I think JDOM should at the very least provide:

    package org.jdom.input;

    class DocTypeReplacerInputStream extends FilterInputStream
    {
        public DocTypeReplacerInputStream( InputStream is, DocType docType )
        {
            ....
        }

        public int read() throw IOException
        {
            ....
        }
    }

which can be used as follows:

    DocType docType = new DocType(
        "countries", "http://www.sillyfish.com/countries.dtd" );

    Document doc = new SAXBuilder( true ).build(
        new DocTypeReplacerInputStream(
            new FileInputStream( "countries.xml" ) ) );

to force validation against a DTD specified by the application.

In addition to providing this DocTypeReplacerInputStream class, I
think it would be such a useful thing to have, that the following
methods should be added to class SAXBuilder:

    public Document build( InputStream is, DocType docType );
    public Document build( URL url, DocType docType );
    public Document build( File file, DocType docType );

which would behave the same way as their equivalent versions without
the DocType parameter, except that they would validate against the
specified DocType rather than the one (if any) in the document.  You
could then write code like this:

    DocType docType = new DocType(
        "countries", "http://www.sillyfish.com/countries.dtd" );

    Document doc = new SAXBuilder( true )
        .build( new FileInputStream( "countries.xml" ), docType );

-- 
Geoff Rimmer <> geoff.rimmer at sillyfish.com <> www.sillyfish.com
www.sillyfish.com/phone - Make savings on your BT and Telewest phone calls
UPDATED 09/05/2002: 508 destinations, 12 schemes (with contact details)



More information about the jdom-interest mailing list