[jdom-interest] XML Schema classification help
Michael Kay
mike at saxonica.com
Wed Jan 4 15:05:12 PST 2012
On 04/01/2012 19:11, cliff palmer wrote:
> I need to examine XML documents contained in multiple columns in a
> database table with over a million rows and identify each of the
> different structures used for the XML data, producing a count if the
> number of instances that use each structure.
>
> I thought of using the SAXParser then creating a list of the XML
> headers in the order used and storing each unique list and
> accumulating a count based on matching an already encountered list
> object, but I am hoping there is a less cumbersome approach.
>
> I would appreciate any and all suggestions.
>
You've chosen an odd place to ask the question, since there's nothing
specific in JDOM that will help you.
The key thing you need to do is to define what are the rules for your
taxonomy. Presumably it's something more complex than categorizing
documents by the name of their root element, or the namespaces they use.
But presumably a document with four paragraphs and two images and one
with five paragraphs and no images go in the same bucket. So what are
the rules?
Michael Kay
Saxonica
More information about the jdom-interest
mailing list