[jdom-interest] Partial Tree building/instantiation --- XPath Filter

philip.nelson at omniresources.com philip.nelson at omniresources.com
Mon Apr 2 14:47:16 PDT 2001


Probably using an XMLFilter is the easiest idea. SaxBuilder has a method
which allows you to set a customized handler that implements XMLFilter
With this you can override startElement and endElement and filter out all
but the elements you want. There could still be an issue with the size
though.  Will the front end be very interactive where a full scan text
search is actually practical?
 
 
 -----Original Message-----
From: Jakob Jenkov [mailto:jakob at jenkov.com]
Sent: Monday, April 02, 2001 3:34 PM
To: jdom-interest at jdom.org
Subject: [jdom-interest] Partial Tree building/instantiation --- XPathFilter



Hi There.
 
I'm currently working on a long, long :-) project in which we parse through
some quite long files. We have tried converting these files to XML for
easier/standard parsing but each file will then be of a size of about 16-30+
MB each. I don't even dare think about how much memory such a JDOM tree
would take! And the plans for lazy evaluation won't help, since we are
visiting every node in the tree, thus instantiating all objects anyway.
Parsing the trees solely using SAX is not developer-friendly enough. What I
have in mind is some kind of a XPath filter, allowing you to build JDOM
trees from sub trees from the data, and dipose these trees when I don't
longer need that tree. Let me give an example:
 
We parse phone call records in files that sometimes can contain thousands
and thousands of records. In XML format these files and records would look
something like this:
 
<transferBatch>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>

    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    ...
    ...
    ...
</transferBatch>
 
 
 
Each <phoneCall> record with all it's sub records can be quite large, and
there can be thousands of these <phoneCall> records. I'd like some way to
get a JDOM tree for each <phoneCall> record one at a time, and to be able to
dispose <phoneCall> JDOM tree before moving on to the next. How will I do
that?
 
My Suggestion would be to insert an XPathFilter, that only builds JDOM trees
from the records that match the given XPath. In the example above, an XPath
of    transferBatch::phoneCall   would have done the job.
 
Does my complaints/ideas sound completely out-of-this-world? I think there
are many out there who will have the same problem, parsing one sub tree at a
time, without regard to the others.
 
 
Regards,
Jakob Jenkov
jakob at jenkov.com
 
 
 
 
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20010402/d3d3bf8b/attachment.htm


More information about the jdom-interest mailing list