[jdom-interest] Re: Manipulating a very large XML file

Jason Robbins jrobbins at tigris.org
Mon Mar 14 14:33:12 PST 2005



>We have a large XML file (around 5 GB) that should be modified based on
>certain business rules. What parser can be used other than DOM ? Is it
>possible to  create a tree structure just for the segment that should be
>modified ?

As others pointed out, if you have 5GBs of data, you probably should
not be keeping it in one XML file.  Did you mean 5MB?


Actually, this brings of a question for the group:  
Would people be interested in a memory-efficient DOM or JDOM 
implementation?

I have found and read elsewhere that just loading an XML file of 
size N typically uses up 4*N to 8*N bytes of memory.  There are a
few tricks to avoid that, e.g., deferred parsing, but in practice, 
they seem to quickly degrade back to the same ratio.  The problem 
is that a lot of applications work fine during testing but then 
run out of memory when an end-user tries to use a big input file 
(e.g., after they have been using your application enough to build 
up some data that they really care about).

I am thinking of an internal data structure that would represent
the complete DOM or JDOM tree for an N-byte XML file in more like
N/2 to N bytes of RAM.  This data structure would be fully parsed
and provide quick access to any node.  It would present exactly
the same API that developers are using now.

So, if such a (J)DOM library existed and could be easily dropped in,
would you use it?


Thanks,
jason!
-- 
P.S. You might also be interested in my latest project, ReadySET Pro.
http://www.readysetpro.com/



More information about the jdom-interest mailing list