[jdom-interest] JDOM and memory

Michael Kay mike at saxonica.com
Sat Jan 28 10:37:43 PST 2012

> Finally, I have in the past had some success with the concept of 
> 'reusing' String values. XML Parsers (like SAX, etc.) typically create 
> a new String instance for all the variables they pass. For example, 
> the Element names, prefixes, etc. are all new instances of String. 
> Thus, if you have hundreds of Elements called 'car' in your input XML, 
> you will get hundreds of different String Element names with the value 
> 'car'. I have built a class that does something similar to 
> String.intern() in order to rationalize the hundreds of 
> different-but-equals() values that are passed in by the parsers.
Have you measured how your optimization compares with the effect of 
setting the http://xml.org/sax/features/string-interning property on the 
SAX parser?

Are you doing the interning in a way that guarantees strings can be 
compared using "==", and if so, are you taking advantage of this when 
doing the comparisons? .The big win comes with XPath searches such as 
//x. Does the interning introduce any synchronization? (This is the big 
disadvantage with Saxon's NamePool - it speeds up XPath searching 
substantially, but the contention in a highly concurrent workload can 
become quite significant.)

Are you pooling the QName as a whole, or the local name, prefix and URI 

Michael Kay
> I have incorporated this 'caching' class in to a new JDOMFactory 
> called 'SlimJDOMFactory'. This factory 'normalizes' all String values 
> to a single instance of each unique String value. This significantly 
> reduces the amount of memory used in the JDOM tree especially if there 
> are lots of: similarly named attributes, elements, white-space-padding 
> in otherwise empty elements, or between elements. This process is 
> significantly slower through...
> For example, with the 'hamlet' test case, the 'baseline' memory 
> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
> With Lazy AttributeList it is: 2.06MB in 4.55ms
> With the both it is 1.57MB in 8.3ms
> I am pushing both of these changes in to github. The AttributeList is 
> an easy one to justify. It is fully compatible with prior code, it has 
> positive memory and perfomance impacts.
> The SlimJDOMFactory is also justifiable when you consider:
> 1. the user has to decide to use it specifically.
> 2. The memory saving can be very significant.
> 3. Even though the parse time is slower, the GC time savings can be 
> significant if the document 'hangs around' for a long time - the 
> quicker GC time can add up fast.
> 4. When you have lots of code doing comparisons it is much faster to 
> do equals() calls on Strings that are == as well. It saves a hashCode 
> calculation as well as a string character scan to prove equals().
> Rolf
> On 02/01/2012 3:27 PM, Rolf wrote:
>> Hi all.
>> Memory optimization has never been a top priority for JDOM. At the same
>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
>> have done some analysis, and, I believe I can trim about a quarter to a
>> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>> The first is to merge the ContentList class in to the Element class (and
>> also in to Document). This will reduce the number of Java objects by
>> about half, and that will save about 32 bytes per Element at a minimum
>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>> array, we can save memory on otherwise 'empty' Elements.
>> This can be done by extending the Element (and perhaps Document) class
>> to extend 'List'. It can all be done in a 'backward compatible' way, but
>> also leads to some interesting possibilities, like:
>> for (Content c : element) {
>> ... do something
>> }
>> (for backward compatibility, Element.getContent() will return 'this').
>> The second change is to make the AttributeList instance in Element a
>> lazy-initialization. This would save memory on all Elements that have no
>> attributes, but would have an impact for people who sub-class the
>> Element class and may expect the attributes field to be non-null.
>> I am trying to get a feel for how important this sort of optimization
>> may be. If there is interest then I will make some changes, and test the
>> impact. I may make a separate branch in github to test it out....
>> If the above changes are unrealistic then I don't think it makes sense
>> to even try....
>> Rolf
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com

More information about the jdom-interest mailing list