[jdom-interest] xni builder

philip.nelson at omniresources.com philip.nelson at omniresources.com
Thu Nov 8 03:47:15 PST 2001


> 1. SAXHandler DTD processing
> 
>       PIs in the external or internal DTD become part of the 
> Document. Is it 
> legal for PIs to appear in DTDs? Xerces 2 accepts them. 
> Should PIs in the 
> internal subset become part of the doc, copied as String to 
> the internal 
> subset, or both?

I don't know!  Probably not both.  My guess is that they should be treated
as comments are and show up wherever they do in the original document.  I
don't think I've ever seen one in a dtd or internal subset but that doesn't
mean much.  Anybody?

> 
>       Comments in the external DTD are copied to the internal 
> DTD subset. I 
> can prevent this in XNIBuilder - should I? Can/should SAXHandler

Here is a patch I sent out yesterday to deal with this. Nothing from the
external dtd should get copied to the original document.

> 
>       NOTATIONs in the internal DTD subset are copied as a 
> String minus the 
> publicId

Oops, I'll fix it in the patch, thanks.  I added the PUBLIC and SYSTEM
identifiers as well.

> 
>       NOTATIONs in the external DTD are copied to the 
> internal DTD subset 
> (again minus public Id)

Fixed

> 
>       If there is no internal DTD subset in the source 
> document, the JDOM 
> internal subset is set to the empty string and appears when 
> the document is 
> serialised, ie
>          <!DOCTYPE personnel SYSTEM "personal.dtd" [
>          ]>
>       Is this by design, should the subset be set to null, or 
> should the 
> outputter not output blank internal subsets?

It probably would look better if the blank subset wasn't printed.  I'll
patch xmloutputter as well when I get a chance.

> 
> 
> 2. XNI Builder
> 
>       Building the internal DTD means I have copied all the 
> buffer.append() in 
> SAXHandler. I've moved this into a BuilderHelper class which 
> just contains 
> static methods to append the string representation of 
> comments etc to a 
> buffer.

I'm curious about the signatures.  Here is a wild idea.  What about putting
these as instance methods on JDOMFactory.  Then add a public Object
getInternalSubset() method.  We would assume by default this would return a
StringBuffer but some adventurous folks could subclass and return a
different version if they wanted.  toString would have to give the version
we expect for XMLOutputter.  Then DocType would have to change to accept
Object instead of String.

>       NOTATIONs don't appear to be represented in JDOM? (No 
> Notation class)

Elliotte had a proposal on the table which I'll try to recreate from memory.
His idea was to give each Attribute an AttributeType so an unparsed entity
would have the notation type, and some of the other standard dtd types would
be represented as well.  In my mind the downside is that if these types are
added in code, will have to parse the internal subset and dtd and any
declared parameter entities to see if they must be declared or not.  He may
have a better idea.  I also don't like the idea of a primarily unused piece
of data on attributes that could grow the size of the document unnecesarily.
A subclass of Attribute could help there, a TypedAttribute or some such.

Elliotte?

> 
>       Here's my current list of todos.
>          complete internal DTD subset stuff.
>          error handling
>          add all the build() methods from SAXBuilder
>          accessor methods for ignorablewhite space etc as per 
> SAX builder
>          split Builder/Handler
>          possibly just add the handlers to a XNI parser 
> configuration (no need 
> to extend XMLDocumentParser according to andy clark)
>          factories
>          javadoc
> 
>       How can I test this? For the first stab I'm aiming to 
> build pretty much 
> what SAX builder does. Is there a document comparison app 
> that will compare in 
> detail two JDOM documents? Then I could build with 
> SAXBuilder, build with 
> XNIBuilder, compare the resulting Documents.

I have been on the lookout for a decent way to do canonical xml with JDOM.
If we could do this, we could use the xml conformance tests from NIST/OASIS.
A new project was just announced on xml hack but it's probably DOM based and
out DOMOutputter hasn't been through massive testing yet.

Alex Chaffee was supposedly working on a bunch of tests for XMLOutputter.
If they were there you could round trip the documents and see if the output
matched.  Alex is on another hiatus though.

Brett had an article at Developerworks I think that offered a way to do
something like this with SAX as I recall.

> 
> How was it envisaged access to the xml decl would be acheived 
> - attributes on 
> Document? an XMLDecl class which is an attribute on Document? 

In our last discussion of this, we didn't have anyway to build it so we left
it out.  This would still be the case when not using Xerces but I'm open to
suggestions.  Actually, I have almost no opinion ;-)  It comes up from time
to time when somebody wants to preserve the encoding from the original
document.

> What happens 
> when a JDOM doc is built from SAXBuilder or by hand - default 
> values or null 
> values. What would be the logic for outputters to generate 
> the xml decl?
> 
> 
> Also, was there an intention to provide access to attribute datatypes?

See above.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: SAXHandler.java.diff
Type: application/octet-stream
Size: 3338 bytes
Desc: not available
Url : http://jdom.org/pipermail/jdom-interest/attachments/20011108/64b0b348/SAXHandler.java.obj


More information about the jdom-interest mailing list