[jdom-interest] Parsing a MODS-document with validation fails

Thomas Scheffler thomas.scheffler at uni-jena.de
Fri Jul 22 13:21:33 PDT 2011


Am 22.07.2011 22:12, schrieb Jason Hunter:
> Thanks, Thomas.  I'll integrate it.

Thank you for responding and integrating it.

regards,

Thomas

> Am 21.07.2011 10:14, schrieb Thomas Scheffler:
>>> Am 21.07.2011 04:18, schrieb Bradley S. Huffman:
>>>> Which version of JDOM?  My first guess is it is something in XMLOutputter.
>>> This is the latest and greatest 1.1.1. I would not suspect XMLOutputter here as it usually does not have any problems with namespaces. This seems to be a parsing issue.
>> It is a bug in the SAXHandler class where attributes with a different Namespace are only detected by their QName and not by the different Namespace-URI. I attached a patch that fixes this bug.
>> It would be great, if this could be integrated and released soon in a version 1.1.2.
>>
>> regards
>>
>> Thomas Scheffler
>>
>>>> On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
>>>> <thomas.scheffler at uni-jena.de>   wrote:
>>>>> Hi,
>>>>>
>>>>> if I parse a valid MODS document with XML Schema validation, JDOM changes
>>>>> attributes as it handles default values of schema not correctly (by ignoring
>>>>> the namespace).
>>>>>
>>>>> Here is a short code to demonstrate this:
>>>>>
>>>>> SAXBuilder builder = new SAXBuilder(true);
>>>>> builder.setFeature("http://xml.org/sax/features/namespaces", true);
>>>>> builder.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
>>>>> builder.setFeature("http://apache.org/xml/features/validation/schema",
>>>>> true);
>>>>>
>>>>> Document document = builder.build(new
>>>>> URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
>>>>> XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
>>>>> xout.output(document, System.out);
>>>>>
>>>>> Here is a result fragment:
>>>>>
>>>>> <name type="simple">
>>>>> <namePart type="family">Edwards</namePart>
>>>>> <namePart type="given">Stephen A.</namePart>
>>>>> <role>
>>>>> <roleTerm type="text">author</roleTerm>
>>>>> </role>
>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>> </name>
>>>>>
>>>>> If you look at the original document you can see, that @type of name is
>>>>> "personal". The "simple" comes from the xlink XML-Schema that was included
>>>>> by the MODS-Schema. Therefor the result fragment should look like this:
>>>>>
>>>>> <name type="personal" xlink:type="simple">
>>>>> <namePart type="family">Edwards</namePart>
>>>>> <namePart type="given">Stephen A.</namePart>
>>>>> <role>
>>>>> <roleTerm type="text">author</roleTerm>
>>>>> </role>
>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>> </name>
>>>>>
>>>>> If I use DOM from Java this is done correctly (but a bit ugly as it does not
>>>>> use the namespace prefix already defined).
>>>>>
>>>>> Could someone just fix this, please?


More information about the jdom-interest mailing list