[jdom-interest] Parsing a MODS-document with validation fails

Rolf Lear jdom at tuis.net
Wed Aug 10 08:40:27 PDT 2011


Hi Thomas.

I don't want you to think I just 'dismissed' your patch(es). I know 
you've replied to the other mail as well, but bear with me...

This particular problem is/was very hard to understand the way it was 
first presented. The MODS Schema is 'not simple', the XML is complex, 
it's all just 'murky'.

Further, given my poor understanding of the problem, it was even harder 
to understand your 'fix', especially since it involved 'maintaining' a 
set of new structures.

I think that I, like I imagine most people who looked at the problem, 
quickly became overwhelmed by the sheer volume of XML and XMLSchema. 
Actually, I don't think many people actually had a look simply because, 
what's the expression... "TL;DR".

My intention was not so much to 'fix' the problem, but to understand it. 
All I really did was 'create a simple testcase'....

Once I had the 'simple' test case, it sort of 'crystalized' for me, and 
I could understand your fix better. I had some concerns still, though. 
For example, at first I could not see how your patch was dealing with 
namespace prefixes being redeclared...
... then, when I figured it out, I looked at the nsURIMapping map, and I 
decided that it was buggy, in the sense that there can be multiple 
prefixes for a single URI, and that your patch only 'preserves' one 
(though I may be wrong).

So, I understand now what your patch is doing... and, I figured that for 
an 'edge case', the code was duplicating a lot of effort that was 
already embedded in the Element hierarchy. Also, it is 'touching' a lot 
of code, and introduces a significant overhead in the 'normal' usage. I 
just figured that we could use the existing structures in the JDOM 
hierarchy better than having to re-invent all the Namspace tracking that 
needs to be done to 'do it right'.

Finally, the 'value' of my submission is more in the actual test case 
which (at least for me) makes a complicated problem a little easier to 
understand and reproduce than the actual fix. This is the 'beauty' of 
open source, the sense that we can all look at the issue, and come up 
with a 'better' answer. I am sure (at least I hope) that a lot of people 
will carefully scrutinize whatever fix is finally applied.

Rolf


On 10/08/2011 2:32 AM, Thomas Scheffler wrote:
> Hi,
>
> sorry I am missing a mail from a list or something. I do not have the 
> full code here for a proper review. But from the mail It seems as you 
> are using tree traversal to get the correct namespace prefix. This 
> looks a bit silly to me, as this can be a lot of work on large 
> XML-files although it should work, too.
>
> Why do you not use my second patch with the startPrefixMapping() and 
> endPrefixMapping()? For me it looks better to store this information 
> in a proper place and throw it away after endPrefixMapping() event, 
> than walking up the tree.
>
> But again, I am missing the code you both are talking about.
>
> regards,
>
> Thomas
>
> Am 10.08.2011 01:14, schrieb Jason Hunter:
>> // First look at the element itself
>>
>>> if (p.getNamespace().getURI().equals(attURI)
>>> && !overrides.contains(p.getNamespacePrefix())
>>> && !"".equals(element.getNamespace().getPrefix())) {
>>> // we need a prefix. It's impossible to have a namespaced
>>> // attribute if there is no prefix for that attribute.
>>> attNS = p.getNamespace();
>>> break uploop;
>>> }
>>
>> // Then any additional namespaces defined on the element
>>
>>> overrides.add(p.getNamespacePrefix());
>>> for (Iterator it = p.getAdditionalNamespaces().iterator();
>>> it.hasNext(); ) {
>>> Namespace ns = (Namespace)it.next();
>>> if (!overrides.contains(ns.getPrefix())
>>> && attURI.equals(ns.getURI())) {
>>> attNS = ns;
>>> break uploop;
>>> }
>>> overrides.add(ns.getPrefix());
>>> }
>>
>> // If we haven't hit something yet, keep walking up the tree
>>
>>> if (p == element) {
>>> p = currentElement;
>>> } else {
>>> p = p.getParentElement();
>>> }
>>> } while (p != null);
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>



More information about the jdom-interest mailing list