[jdom-interest] JDOM Issue #5 - DTD-aware Attribute output

Rolf Lear jdom at tuis.net
Fri Mar 23 06:21:41 PDT 2012

Hi Paul.

If you were wondering why no-one on the list has commented, it may be 
because you you never sent it to the list, just to me ... ;-), so I have 
CC'd the list for you...

Anyway, I have been looking in to things, and I think the problem is 
that you have missed a detail in the way the data is processed.

Using your example document:

This document (apart from being 'big'), refers to a single DTD, which, 
in the case of this document, only really defaults one attribute: 
'scheme' on the 'competency' element (which defaults to "PISA").

Now, as far as I know, there are only the following ways to reference 
content of the DTD:

If you are doing no DTD validation, the DTD will still be accessed to 
resolve entity references. But, that is the *only* thing that will be 
pulled form the DTD.

If you do validation, then the entire DTD is read, and the validation is 
done, and any attributes defaulted in the DTD will be created in the XML 

So, it is my understanding that it is impossible to have 'all the 
defaulted attributes' without also having done the full DTD Validation.

As it happens, I often use the tool 'xmllint' (available on most unix 
systems, including linux) to check my understanding, and, I may be wrong 
on this because xmllint has the argument --dtdattr which appears to do a 
partial thing of loading the defaulted attrs, but not a full validation...

Anyway, the point is that, using JDOM, and standard SAX parsing, the 
only time you could have had 'all the defaulted attrs was when you were 
doing full validation anyway... and that full validation fails.

So, if you do not do validating, you will not get the 'scheme' 
attributes, and you will not output the scheme attributes (you do not 
have them to output...).

If you do validating, then you have the scheme attributes, and then you 
can now choose to ignore them on the output with the new Format setting.

Your particular problem is confusing to me, and there must be something 
I am missing.... I can't figure out why you think you are getting all 
the defaulted attributes when it is clear you are not validating...

So, that is my first issue, and I think it means that you are confused 
too ;-)

The second issue with the namespace declarations is also confusing to 
me. In your example document, every single namespace declaration is 
essential.... not a single one is 'redundant'.

Is it possible that it is just a bad example?

Anyway, at the worst possible case, I have a hack that would probably 
make you happy, but makes me cringe.... I would rather understand your 
problem properly before I suggest it.



On 22/03/2012 4:27 PM, Paul Libbrecht wrote:
> Hello list,
> Rolf has been so kind to show me how JDOM issue #5 can be run.
> So I ran the following snippet:
>          SAXBuilder builder = new SAXBuilder(XMLReaders.DTDVALIDATING);
>          Document doc = builder.build(new URL(args[0]));
>          Format speconly = Format.getRawFormat();
>          speconly.setSpecifiedAttributesOnly(true);
>          XMLOutputter xout = new XMLOutputter(speconly);
>          xout.output(doc, System.out);
> which allows me to parse a JDOM source, make modifications (typically: refactorings), then output with almost no difference.
> The big advantage to that is that the attributes that were not there... are simply not injected from the DTD.
> This is enormous in some XML editing tradition which uses implied values a lot.
> There's two BUT:
> 1) This currently fails if the validation fails and this is a big problem to me.
> An example file would be the following:
>    http://svn.activemath.org/LeAM-calculus/LeAM_calculus/oqmath/contin.oqmath
> which references a DTD nearby. This is a manually edited file.
> Removing the validation, sadly disables the passing of attribute presence info, it seems.
> Rolf, is there a way that the attribute presence info is passed but the validation is not stopped?
> 2) namespace declarations, which are kind of attributes, still resurface. They should be avoided if not present ideally. Doable?
> The approach of Rolf is better than the one I had because mine was simply checking in the DTD if the attribute was provided by it and, if yes, removing its output while in Rolf's approach, an attribute that is there is output if... it was there, simply!
> Thanks for comments.
> paul

More information about the jdom-interest mailing list