[jdom-interest] Parsing a MODS-document with validation fails

Thomas Scheffler thomas.scheffler at uni-jena.de
Thu Jul 21 01:14:36 PDT 2011


Am 21.07.2011 04:18, schrieb Bradley S. Huffman:
> Which version of JDOM?  My first guess is it is something in XMLOutputter.

This is the latest and greatest 1.1.1. I would not suspect XMLOutputter 
here as it usually does not have any problems with namespaces. This 
seems to be a parsing issue.

regards

Thomas Scheffler

>
> On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
> <thomas.scheffler at uni-jena.de>  wrote:
>> Hi,
>>
>> if I parse a valid MODS document with XML Schema validation, JDOM changes
>> attributes as it handles default values of schema not correctly (by ignoring
>> the namespace).
>>
>> Here is a short code to demonstrate this:
>>
>> SAXBuilder builder = new SAXBuilder(true);
>> builder.setFeature("http://xml.org/sax/features/namespaces", true);
>> builder.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
>> builder.setFeature("http://apache.org/xml/features/validation/schema",
>> true);
>>
>> Document document = builder.build(new
>> URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
>> XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
>> xout.output(document, System.out);
>>
>> Here is a result fragment:
>>
>> <name type="simple">
>> <namePart type="family">Edwards</namePart>
>> <namePart type="given">Stephen A.</namePart>
>> <role>
>> <roleTerm type="text">author</roleTerm>
>> </role>
>> <affiliation>Columbia University. Computer Science</affiliation>
>> </name>
>>
>> If you look at the original document you can see, that @type of name is
>> "personal". The "simple" comes from the xlink XML-Schema that was included
>> by the MODS-Schema. Therefor the result fragment should look like this:
>>
>> <name type="personal" xlink:type="simple">
>> <namePart type="family">Edwards</namePart>
>> <namePart type="given">Stephen A.</namePart>
>> <role>
>> <roleTerm type="text">author</roleTerm>
>> </role>
>> <affiliation>Columbia University. Computer Science</affiliation>
>> </name>
>>
>> If I use DOM from Java this is done correctly (but a bit ugly as it does not
>> use the namespace prefix already defined).
>>
>> Could someone just fix this, please?


-- 
Thomas Scheffler
Friedrich-Schiller-Universität Jena
Thüringer Universitäts- und Landesbibliothek
Bibliotheksplatz 2
07743 Jena
Phone: ++49 3641 940027
FAX:   ++49 3641 940022


More information about the jdom-interest mailing list