[jdom-interest] JDOM and memory

Sun Jan 29 02:58:18 PST 2012

Rolf,

I do know there are applications (such as what Michael reported about: that generate random prefixes) for which any form of pooling is dangerous; and you show that there are situation where interning performs worth than other pooling methods (I think hashCode might be seen as guilty but that can't been changed).

Nonetheless, I believe the design that we had where the element names were interned is common: in the server application that was there, the ActiveMath learning environment, the element names are everywhere in the java code as well, e.g. for comparison within if statements. So for this interning is actually better than pooling overall. 

I'm convinced many JDOM users have this approach; using JDOM is cute for Java programming, not for XSLT friends that only see the world as pipelines translatable into a set of unix xsltproc calls.

I would suggest the following:
- make this configurable
- make this subclassable and exploitable

That is to let e.g. SAXBuilder have a method:

    public String makePooledName(String)

which would then call the right interning method (String.intern for those who want, SlimJDOMFactory's per default?, nothing for those who fear retention).

That'd be in SAXBuilder or JDOMFactory? I'm afraid there's no global JDOM config object, that'd be the place, e.g. also to be called from new Element("name").

paul

Le 29 janv. 2012 à 02:41, Rolf Lear a écrit :

> I have now compared the results of string-interning to the String-cache code.
> 
> The 'raw' code (neither SLimJDOMFactory nor string-interning) is:
> 2.06MB @ 4.55ms
> The SlimJDOMFactory is:
> 1.57MB @ 8ms
> The string-interning SAX Feature is:
> 2.06MB @ 6.1ms
> 
> Not sure how I got essentially zero improvement of memory.... got something wrong..... no... been checking, but I think the difference in using String.intern on element names only is so insignificant that it does not feature as much as 1%.....  perhaps all the dirrerence is coming in whitespace....
> 
> Not worth checking in to it.... I don't believe the String.itern() is the right answer regardless.
> 
> Rolf
> 
> 
> On 28/01/2012 1:37 PM, Michael Kay wrote:
>> 
>>> 
>>> 
>>> Finally, I have in the past had some success with the concept of
>>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create
>>> a new String instance for all the variables they pass. For example,
>>> the Element names, prefixes, etc. are all new instances of String.
>>> Thus, if you have hundreds of Elements called 'car' in your input XML,
>>> you will get hundreds of different String Element names with the value
>>> 'car'. I have built a class that does something similar to
>>> String.intern() in order to rationalize the hundreds of
>>> different-but-equals() values that are passed in by the parsers.
>> Have you measured how your optimization compares with the effect of
>> setting the http://xml.org/sax/features/string-interning property on the
>> SAX parser?
>> 
>> Are you doing the interning in a way that guarantees strings can be
>> compared using "==", and if so, are you taking advantage of this when
>> doing the comparisons? .The big win comes with XPath searches such as
>> //x. Does the interning introduce any synchronization? (This is the big
>> disadvantage with Saxon's NamePool - it speeds up XPath searching
>> substantially, but the contention in a highly concurrent workload can
>> become quite significant.)
>> 
>> Are you pooling the QName as a whole, or the local name, prefix and URI
>> separately?
>> 
>> Michael Kay
>> Saxonica
>>> 
>>> I have incorporated this 'caching' class in to a new JDOMFactory
>>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values
>>> to a single instance of each unique String value. This significantly
>>> reduces the amount of memory used in the JDOM tree especially if there
>>> are lots of: similarly named attributes, elements, white-space-padding
>>> in otherwise empty elements, or between elements. This process is
>>> significantly slower through...
>>> 
>>> For example, with the 'hamlet' test case, the 'baseline' memory
>>> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>>> With Lazy AttributeList it is: 2.06MB in 4.55ms
>>> With the both it is 1.57MB in 8.3ms
>>> 
>>> I am pushing both of these changes in to github. The AttributeList is
>>> an easy one to justify. It is fully compatible with prior code, it has
>>> positive memory and perfomance impacts.
>>> 
>>> The SlimJDOMFactory is also justifiable when you consider:
>>> 1. the user has to decide to use it specifically.
>>> 2. The memory saving can be very significant.
>>> 3. Even though the parse time is slower, the GC time savings can be
>>> significant if the document 'hangs around' for a long time - the
>>> quicker GC time can add up fast.
>>> 4. When you have lots of code doing comparisons it is much faster to
>>> do equals() calls on Strings that are == as well. It saves a hashCode
>>> calculation as well as a string character scan to prove equals().
>>> 
>>> Rolf
>>> 
>>> On 02/01/2012 3:27 PM, Rolf wrote:
>>>> Hi all.
>>>> 
>>>> Memory optimization has never been a top priority for JDOM. At the same
>>>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
>>>> have done some analysis, and, I believe I can trim about a quarter to a
>>>> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>>>> 
>>>> The first is to merge the ContentList class in to the Element class (and
>>>> also in to Document). This will reduce the number of Java objects by
>>>> about half, and that will save about 32 bytes per Element at a minimum
>>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>>>> array, we can save memory on otherwise 'empty' Elements.
>>>> 
>>>> This can be done by extending the Element (and perhaps Document) class
>>>> to extend 'List'. It can all be done in a 'backward compatible' way, but
>>>> also leads to some interesting possibilities, like:
>>>> 
>>>> for (Content c : element) {
>>>> ... do something
>>>> }
>>>> 
>>>> (for backward compatibility, Element.getContent() will return 'this').
>>>> 
>>>> 
>>>> The second change is to make the AttributeList instance in Element a
>>>> lazy-initialization. This would save memory on all Elements that have no
>>>> attributes, but would have an impact for people who sub-class the
>>>> Element class and may expect the attributes field to be non-null.
>>>> 
>>>> 
>>>> I am trying to get a feel for how important this sort of optimization
>>>> may be. If there is interest then I will make some changes, and test the
>>>> impact. I may make a separate branch in github to test it out....
>>>> 
>>>> If the above changes are unrealistic then I don't think it makes sense
>>>> to even try....
>>>> 
>>>> Rolf
>>>> _______________________________________________
>>>> To control your jdom-interest membership:
>>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>> 
>>> 
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>> 
>> 
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>> 
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com