From paul at hoplahup.net Sat Nov 5 02:35:45 2011 From: paul at hoplahup.net (Paul Libbrecht) Date: Sat, 5 Nov 2011 10:35:45 +0100 Subject: [jdom-interest] jdom 1.1.2 references missing maven artifact In-Reply-To: References: Message-ID: <1C26E277-C590-417A-9506-44AF1C19BB71@hoplahup.net> Don, if the scope is compile it should not try to be fetched when you build based on jdom, or is there something I do not understand in the maven build way? This dependency is an *optional* dependency. paul Le 5 nov. 2011 ? 03:31, Don Corley a ?crit : > The jdom maven pom at: > > > org.jdom > > jdom > > 1.1.2 > > references this pom: > > > jaxen > jaxen > 1.1.3 > compile > > > Which does not exist in maven central, so I get an error when I include jdom-1.1.2 in my maven project. > > Thanks. > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Sat Nov 5 12:41:26 2011 From: jdom at tuis.net (Rolf) Date: Sat, 05 Nov 2011 15:41:26 -0400 Subject: [jdom-interest] jdom 1.1.2 references missing maven artifact In-Reply-To: References: Message-ID: <4EB59166.4030803@tuis.net> Hi Don. I may have messed that up a bit... (actually, there's no 'maybe'...). The predicament is that the current version of Jaxen is 1.1.3, and I have used that version to build JDOM. I set that version to be the one in the pom. The 1.1.1 version is from 2007.... I did not realize that jaxen was not maintained in Maven... since it quite clearly says on it's home-page that they are 'ramping up to using maven'. Actually, there are still lots of things I do not know about maven. So, I used the details of what I compiled with, rather than what's available in maven. Bear in mind that JDOM is not built using maven dependencies, it is built independently, and then 'deployed' on Maven to be available to all. I will try to: change the version dependency to be: jaxen jaxen [1.1.1,1.2.0) true I believe this will have the effect of: - making the dependencies work in maven since 1.1.1 is there, but, if you put a newer 1.1.3 in your local maven repository, then it will use that. - making it not download jaxen at all.... but, you will need it if you use XPath mechanisms in JDOM. I will see if I can update the current pom without having to release a new version.... Rolf On 04/11/2011 10:31 PM, Don Corley wrote: > The jdom maven pom at: > > |||<||groupId||>org.jdom| > |||<||artifactId||>jdom| > |||<||version||>1.1.2| > > references this pom: > > > jaxen > jaxen > 1.1.3 > compile > > > Which does not exist in maven central, so I get an error when I include > jdom-1.1.2 in my maven project. > > Thanks. > > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Sat Nov 5 13:22:46 2011 From: jdom at tuis.net (Rolf) Date: Sat, 05 Nov 2011 16:22:46 -0400 Subject: [jdom-interest] jdom 1.1.2 references missing maven artifact In-Reply-To: <4EB59166.4030803@tuis.net> References: <4EB59166.4030803@tuis.net> Message-ID: <4EB59B16.8060004@tuis.net> Hi again. I think I have updated maven central with a new pom.xml. (it takes a while to be updated) The best I can do is make it optional.... and not specify a particular version for jaxen. This means that you will by default use jaxen 1.1.1. This is not ideal. You should still update your jaxen to 1.1.3, and 'complain' to jaxen that they have not updated the maven-central. So, in your case, despite me having updated maven-central, you (and everyone else) should still use the maven mechanism for dealing with this situation: Manually download the 1.1.3 version of jaxen, and install it in to your maven repository. If you are on a 'unix' machine: mkdir jaxentmp cd jaxentmp curl -O http://dist.codehaus.org/jaxen/distributions/jaxen-1.1.3.zip unzip jaxen-1.1.3.zip mvn install:install-file -DgroupId=jaxen -DartifactId=jaxen -Dversion=1.1.3 -Dpackaging=jar -Dfile=jaxen-1.1.3/jaxen-1.1.3.jar After doing that, your jdom build should work. I do not know the equivalent steps on windows/eclipse/intellij/whatever. Rolf On 05/11/2011 3:41 PM, Rolf wrote: > Hi Don. > > I may have messed that up a bit... (actually, there's no 'maybe'...). > ... From jdom at tuis.net Sun Nov 6 05:26:46 2011 From: jdom at tuis.net (Rolf) Date: Sun, 06 Nov 2011 08:26:46 -0500 Subject: [jdom-interest] jdom 1.1.2 references missing maven artifact In-Reply-To: References: <4EB59166.4030803@tuis.net> <4EB59B16.8060004@tuis.net> Message-ID: <4EB68B16.6050301@tuis.net> I pushed an update though, but even though I see updated timestamps on the metadata and the 1.1.2 folder in maven-central, I don't see the new POM. http://search.maven.org/#browse%7C-1946144149 - see timestamps but nothing changed: http://search.maven.org/#browse%7C-167108894 The process is somewhat 'odd', and I don't fully understand it. I'm going to wait and see if the POM get's modified, but, if not, I'm not sure how to proceed. Rolf On 05/11/2011 6:26 PM, Don Corley wrote: > Rolf, > > Thanks so much for your quick response. > > I've created a bug report with jaxen to try to get them to publish their > code to maven central at: > http://jira.codehaus.org/browse/JAXEN-217 > > I'll watch for your new code to appear in maven central. > > Sometimes when I have a dependency that is not in maven central, I > publish it myself there. Most open source licenses allow redistribution > as long a you include their licenses. Instructions are here: > https://docs.sonatype.org/display/Repository/Uploading+3rd-party+Artifacts+to+The+Central+Repository > > Hopefully they will upload their code soon! > > Thanks again! > > Don > JiBX contributor (we use jdom!) From jdom at tuis.net Mon Nov 7 07:59:46 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 07 Nov 2011 10:59:46 -0500 Subject: [jdom-interest] validating XML without throwing a Java exception In-Reply-To: References: Message-ID: <5d561532526716985a68f42eb694f5c3@tuis.net> Hi Cliff. You can 'catch' the Exception for each string that you validate. For example SAXBuilder builder = new SAXBuilder(); for (String maybexml : stringstotest) { try { StringReader reader = new StringReader(maybexml); Document doc = builder.build(reader); ... do something with the well-formed document.... } catch (JDOMException jdome) { // That string was not valid XML // do something about it.... System.out.println("String " + maybexml + " was not XML: " + jdome.getMessage()); } catch (IOEXception ioe) { // StringReader should never throw an IOException.... but, just in case ioe.printStackTrace(); } } Rolf On Mon, 7 Nov 2011 10:42:35 -0500, cliff palmer wrote: > I need to examine several hundred thousand text strings and accumulate a > count of the number of strings containing "well-formed XML" (i.e. can be > parsed with saxBuilder and then used) vs "poorly-formed" (i.e. something in > the string prevents successful parsing). My (limited) experience is that > an exception (such as a JDOMException thrown by Saxbuilder.build()) will > halt execution. > How can I validate the XML in these strings without halting execution? > Thanks! > Cliff From jdom at tuis.net Tue Nov 8 07:44:25 2011 From: jdom at tuis.net (Rolf Lear) Date: Tue, 08 Nov 2011 10:44:25 -0500 Subject: [jdom-interest] =?utf-8?q?Is_saxbuilder_safe_for_multi-threaded_?= =?utf-8?q?=28concurrent=29_use=3F?= In-Reply-To: References: Message-ID: <73070bab19cdd3e21f6e15423aa962e6@tuis.net> SAXBuilder is *not* thread-safe, but it is reusable.... i.e. you need one SAXBuilder per thread, but after parsing something, you can reuse the builder for the next parse. You can run many different SAXBuilders 'in parallel', just so long as no specific SAXBuilder instance is referenced from different threads. For improved performance set the fast reconfigure and reuseParser flags on the builder: builder.setReuseParser(true); builder.setFastReconfigure(true); Rolf On Tue, 8 Nov 2011 10:23:06 -0500, cliff palmer wrote: > I have a large number (hundreds of thousands) of XML streams embedded in > files to examine and was considering a multi-threaded design for the java > code, dispatching the streams to a number of worker threads that would call > Saxbuilder.build() then continue with the needed work. I'm wondering if > Saxbuilder.build, and the other JDOM code on which it relies, is "thread > safe" so that I don't run into problems with concurrency using it. > Obviously my code will have to (also) be concurrency-friendly as well and I > will have to manage resources carefully. > Thanks in advance. > Cliff From paul at hoplahup.net Tue Nov 8 08:02:08 2011 From: paul at hoplahup.net (Paul Libbrecht) Date: Tue, 8 Nov 2011 17:02:08 +0100 Subject: [jdom-interest] Is saxbuilder safe for multi-threaded (concurrent) use? In-Reply-To: <73070bab19cdd3e21f6e15423aa962e6@tuis.net> References: <73070bab19cdd3e21f6e15423aa962e6@tuis.net> Message-ID: That means the best design pattern is to use threadlocals with these or? paul Le 8 nov. 2011 ? 16:44, Rolf Lear a ?crit : > > SAXBuilder is *not* thread-safe, but it is reusable.... > > i.e. you need one SAXBuilder per thread, but after parsing something, you > can reuse the builder for the next parse. > > You can run many different SAXBuilders 'in parallel', just so long as no > specific SAXBuilder instance is referenced from different threads. > > For improved performance set the fast reconfigure and reuseParser flags on > the builder: > > builder.setReuseParser(true); > builder.setFastReconfigure(true); > > Rolf > > > On Tue, 8 Nov 2011 10:23:06 -0500, cliff palmer > wrote: >> I have a large number (hundreds of thousands) of XML streams embedded in >> files to examine and was considering a multi-threaded design for the > java >> code, dispatching the streams to a number of worker threads that would > call >> Saxbuilder.build() then continue with the needed work. I'm wondering if >> Saxbuilder.build, and the other JDOM code on which it relies, is "thread >> safe" so that I don't run into problems with concurrency using it. >> Obviously my code will have to (also) be concurrency-friendly as well > and I >> will have to manage resources carefully. >> Thanks in advance. >> Cliff > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Tue Nov 8 08:45:10 2011 From: jdom at tuis.net (Rolf Lear) Date: Tue, 08 Nov 2011 11:45:10 -0500 Subject: [jdom-interest] =?utf-8?q?Is_saxbuilder_safe_for_multi-threaded_?= =?utf-8?q?=28concurrent=29_use=3F?= In-Reply-To: References: <73070bab19cdd3e21f6e15423aa962e6@tuis.net> Message-ID: <4d6cced9d4098e0a6861434f4869e977@tuis.net> To clarify this all, it all boils down to the fact that XMLReader (and other SAX classes/interfaces) are not specified to be Thread-Safe, so, at it's lowest level, SAXBuilder can never guarantee to be thread-safe. SAXBuilder has other internal fields that are not protected either, but it would be pointless to protect them since the actual parser itself is not thread-safe. The safest thing to do (the only safe thing to do) is to ensure that you never use a SAXBuilder across threads. There are lots of 'patterns' you can use to ensure this. Ones that come to mind are: - create a new SAXBuilder instance each time you need one - create a 'pool' and check-out/check-in each builder. - 'synchronize' the block containing the SAXBuilder. - using thread-locals - .... The pattern you use is very dependent on your particular circumstances... I can't recommend any particular one. In the most common case, though, where most time is spent actually processing the document information, rather than parsing the input, is to just create a parser when you need it... In tight-loops, like Cliff is doing, I would just initialiase each thread with it's own SAXBuilder, and reuse it in the thread. More complicated patterns that that are probably more 'expensive' to manage than any 'savings' that can be found. Rolf On Tue, 8 Nov 2011 17:02:08 +0100, Paul Libbrecht wrote: > That means the best design pattern is to use threadlocals with these or? > > paul > > > Le 8 nov. 2011 ? 16:44, Rolf Lear a ?crit : > >> >> SAXBuilder is *not* thread-safe, but it is reusable.... >> >> i.e. you need one SAXBuilder per thread, but after parsing something, you >> can reuse the builder for the next parse. >> >> You can run many different SAXBuilders 'in parallel', just so long as no >> specific SAXBuilder instance is referenced from different threads. >> >> For improved performance set the fast reconfigure and reuseParser flags >> on >> the builder: >> >> builder.setReuseParser(true); >> builder.setFastReconfigure(true); >> >> Rolf >> >> >> On Tue, 8 Nov 2011 10:23:06 -0500, cliff palmer >> wrote: >>> I have a large number (hundreds of thousands) of XML streams embedded in >>> files to examine and was considering a multi-threaded design for the >> java >>> code, dispatching the streams to a number of worker threads that would >> call >>> Saxbuilder.build() then continue with the needed work. I'm wondering if >>> Saxbuilder.build, and the other JDOM code on which it relies, is "thread >>> safe" so that I don't run into problems with concurrency using it. >>> Obviously my code will have to (also) be concurrency-friendly as well >> and I >>> will have to manage resources carefully. >>> Thanks in advance. >>> Cliff >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Tue Nov 8 11:19:50 2011 From: jdom at tuis.net (Rolf Lear) Date: Tue, 08 Nov 2011 14:19:50 -0500 Subject: [jdom-interest] jdom 1.1.2 references missing maven artifact In-Reply-To: References: <4EB59166.4030803@tuis.net> <4EB59B16.8060004@tuis.net> <4EB68B16.6050301@tuis.net> Message-ID: Hi Don. to educate me a bit better (so I do it better next time), can you answer the following for me: - Where do the cobertura and findbugs dependencies come from? Jaxen? I happen to use both in JDOM, did they come from there? - Should I just 'remove' the dependency on Jaxen, and 'document' it somewhere that if you want to use XPath you need Jaxen? - If I should keep the Jaxen 'dependency', should I mark it optional? - if I should keep the Jaxen 'dependency', should I add these exclusions to the JDOM pom's jaxen dependencies so that they are already there? - is there anything else you can see that is 'broken' with the Maven deploy? Thanks Rolf On Tue, 8 Nov 2011 10:15:34 -0800, Don Corley wrote: > I went ahead and deployed the jaxen 1.1.3 artifact that was missing. > > Unfortunately, there are a few downsteam artifacts that still can't be > found, so you'll need to exclude them. > > To include jdom 1.1.2 as a dependency in your maven project, just add: > > org.jdom > jdom > 1.1.2 > compile > > > maven-plugins > maven-cobertura-plugin > > > maven-plugins > maven-findbugs-plugin > > > > > I reworked the downstream project file, so this should be fixed in the next > version. > > Cheers! > > Don From randallt at us.ibm.com Thu Nov 10 09:24:32 2011 From: randallt at us.ibm.com (Randall Theobald) Date: Thu, 10 Nov 2011 11:24:32 -0600 Subject: [jdom-interest] JDOM parser reuse memory problem Message-ID: Hi, I'm a performance analyst and found a spot where a product I'm analyzing is using JDOM. We are creating new SAXBuilders on each thread and are ending up with a hot lock on the classloader when trying to load up the XMLReader. I saw that the underlying parser in SAXBuilder can be reused, thus leading to a proper pooling strategy, but I have a memory concern. In the case where the parser is reused, nothing is cleared from it at the end of the build method (so the content handler is still held, which can reference lots of objects). Since SAXBuilder doesn't expose a way to clear anything on the reused parser, the only option is using ugly reflection to clear it, or to use (slightly less ugly) WeakReferences to the SAXBuilders in my pool so that they evenutally get cleaned up. Is there a reason that the content handler on 'this.parser' isn't set to null along with the local content handler being set to null in the finally block of the build method? If not, I'd suggest this change. Randall Theobald Performance: WebSphere Business Process Management & Connectivity IBM Software Group randallt at us.ibm.com Austin, TX 512-286-8870 t/l: 363-8870 From mike at saxonica.com Thu Nov 10 09:57:46 2011 From: mike at saxonica.com (Michael Kay) Date: Thu, 10 Nov 2011 17:57:46 +0000 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: References: Message-ID: <4EBC109A.9020908@saxonica.com> On 10/11/2011 17:24, Randall Theobald wrote: > Hi, I'm a performance analyst and found a spot where a product I'm > analyzing is using JDOM. We are creating new SAXBuilders on each thread and > are ending up with a hot lock on the classloader when trying to load up the > XMLReader. I saw that the underlying parser in SAXBuilder can be reused, > thus leading to a proper pooling strategy, but I have a memory concern. In > the case where the parser is reused, nothing is cleared from it at the end > of the build method (so the content handler is still held, which can > reference lots of objects). Since SAXBuilder doesn't expose a way to clear > anything on the reused parser, the only option is using ugly reflection to > clear it, or to use (slightly less ugly) WeakReferences to the SAXBuilders > in my pool so that they evenutally get cleaned up. > > Is there a reason that the content handler on 'this.parser' isn't set to > null along with the local content handler being set to null in the finally > block of the build method? If not, I'd suggest this change. > > I have the same problem in Saxon. When returning a parser to the pool I set all the callbacks to null (ContentHandler, lexicalHandler, etc). Unfortunately some XMLReader implementations don't allow the callback to be set to null (the specs aren't explicit on the point). One approach is to catch the exception, another is to set a dummy ContentHandler or whatever that doesn't have any references to anything. Messy. Michael Kay Saxonica From jdom at tuis.net Thu Nov 10 10:51:41 2011 From: jdom at tuis.net (Rolf Lear) Date: Thu, 10 Nov 2011 13:51:41 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EBC109A.9020908@saxonica.com> References: <4EBC109A.9020908@saxonica.com> Message-ID: Hi Randall, Michael. It's an interesting observation... and I can see the implications. I would like to take a closer look at at, but that may take a little while. I filed https://github.com/hunterhacker/jdom/issues/52 'Off the cuff' I can think of one work-around and a few solutions (in addition to what Michael has suggested) 1. immediately after parsing your real document you then parse a dummy/small/inmemory document (even invalid - and catch the exception). 2. Currently when you do-no reuse the parser, it goes back to 'first principals' and queries JAXP, etc. to find a parser instance Instead it could 'cache' the parser 'source' after the first time, and then just create a new instance, instead of doing all the class-based lookups... JAXP and other data sources are not going to change mid-way through the JVM/ClassLoader lifetime.... That way you could abandon parser re-use, but the cost of new parser instances would be much reduced.... 3. make the SAXHandler 'cleanable' and 'clean' it in the finally block 4. to set the content handler for the *next* parse at the end of the *current* parse.... I would be reluctant to put out another 1.x build of JDOM until there's more than just this issue to fix, and, hopefully, there are no other issues to fix in the 1.x stream, so I would not hold your breath for another 1.x release, but, regardless, and if possible, can you: 1. give some indication of how much of an issue this is? 2. can you wait for JDOM2? (month or so...) 3. did you find any other hot-spots? Thanks Rolf On Thu, 10 Nov 2011 17:57:46 +0000, Michael Kay wrote: > On 10/11/2011 17:24, Randall Theobald wrote: >> Hi, I'm a performance analyst and found a spot where a product I'm >> analyzing is using JDOM. We are creating new SAXBuilders on each thread >> and >> are ending up with a hot lock on the classloader when trying to load up >> the >> XMLReader. I saw that the underlying parser in SAXBuilder can be reused, >> thus leading to a proper pooling strategy, but I have a memory concern. >> In >> the case where the parser is reused, nothing is cleared from it at the >> end >> of the build method (so the content handler is still held, which can >> reference lots of objects). Since SAXBuilder doesn't expose a way to >> clear >> anything on the reused parser, the only option is using ugly reflection >> to >> clear it, or to use (slightly less ugly) WeakReferences to the >> SAXBuilders >> in my pool so that they evenutally get cleaned up. >> >> Is there a reason that the content handler on 'this.parser' isn't set to >> null along with the local content handler being set to null in the >> finally >> block of the build method? If not, I'd suggest this change. >> >> > I have the same problem in Saxon. When returning a parser to the pool I > set all the callbacks to null (ContentHandler, lexicalHandler, etc). > Unfortunately some XMLReader implementations don't allow the callback to > be set to null (the specs aren't explicit on the point). One approach is > to catch the exception, another is to set a dummy ContentHandler or > whatever that doesn't have any references to anything. Messy. > > Michael Kay > Saxonica > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From mike at saxonica.com Fri Nov 11 00:33:30 2011 From: mike at saxonica.com (Michael Kay) Date: Fri, 11 Nov 2011 08:33:30 +0000 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: References: <4EBC109A.9020908@saxonica.com> Message-ID: <4EBCDDDA.7080208@saxonica.com> On 10/11/2011 18:51, Rolf Lear wrote: > Hi Randall, Michael. > > It's an interesting observation... and I can see the implications. I would > like to take a closer look at at, but that may take a little while. > > I filed https://github.com/hunterhacker/jdom/issues/52 > > 'Off the cuff' I can think of one work-around and a few solutions (in > addition to what Michael has suggested) > > 1. immediately after parsing your real document you then parse a > dummy/small/inmemory document (even invalid - and catch the exception). > 2. Currently when you do-no reuse the parser, it goes back to 'first > principals' and queries JAXP, etc. to find a parser instance Instead it > could 'cache' the parser 'source' after the first time, and then just > create a new instance, instead of doing all the class-based lookups... Ouch. Creating a new parser to parse a small document is a cost that it's nice to avoid, but it isn't going to kill you. Going through the JAXP factory process to get a new ParserFactory is a monstrous cost that can dominate all other processing - and reusing the factory costs nothing. Michael Kay Saxonica From jdom at tuis.net Fri Nov 11 03:21:53 2011 From: jdom at tuis.net (Rolf Lear) Date: Fri, 11 Nov 2011 06:21:53 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EBCDDDA.7080208@saxonica.com> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> Message-ID: <4EBD0551.3020205@tuis.net> On 11/11/2011 3:33 AM, Michael Kay wrote: > On 10/11/2011 18:51, Rolf Lear wrote: >> Hi Randall, Michael. >> >> It's an interesting observation... and I can see the implications. I >> would >> like to take a closer look at at, but that may take a little while. >> >> I filed https://github.com/hunterhacker/jdom/issues/52 >> >> 'Off the cuff' I can think of one work-around and a few solutions (in >> addition to what Michael has suggested) >> >> 1. immediately after parsing your real document you then parse a >> dummy/small/inmemory document (even invalid - and catch the exception). >> 2. Currently when you do-no reuse the parser, it goes back to 'first >> principals' and queries JAXP, etc. to find a parser instance Instead it >> could 'cache' the parser 'source' after the first time, and then just >> create a new instance, instead of doing all the class-based lookups... > > Ouch. Creating a new parser to parse a small document is a cost that > it's nice to avoid, but it isn't going to kill you. Going through the > JAXP factory process to get a new ParserFactory is a monstrous cost > that can dominate all other processing - and reusing the factory costs > nothing. > > Michael Kay > Saxonica > Not sure what you are saying... are you agreeing that the 'ouch' problem is the one it has at the moment, or the suggestion to skip the JAXB processing on subsequent non-reuse-parser parses? I have not yet had a close look at the problem... the potential option of not going back to first-principles on subsequent parses may not be (easily) possible.... Unless Randall can convince me otherwise, I'm going to finish working on some StAX outputter code I am embroiled in, and then look at it. Rolf From randallt at us.ibm.com Fri Nov 11 04:31:38 2011 From: randallt at us.ibm.com (Randall Theobald) Date: Fri, 11 Nov 2011 06:31:38 -0600 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EBD0551.3020205@tuis.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> Message-ID: I'd vote for a proper fix of properly resetting the XMLReader at the end of the build method. Our product is stuck on 1.0-beta-7 or something which doesn't even have the parser reuse in it yet, so there is no hurry. I just wanted to make sure that the issue got looked at moving forward. Randall Theobald Performance: WebSphere Business Process Management & Connectivity IBM Software Group randallt at us.ibm.com Austin, TX 512-286-8870 t/l: 363-8870 From: Rolf Lear To: Michael Kay , Cc: jdom-interest at jdom.org Date: 11/11/2011 05:31 AM Subject: Re: [jdom-interest] JDOM parser reuse memory problem Sent by: jdom-interest-bounces at jdom.org On 11/11/2011 3:33 AM, Michael Kay wrote: > On 10/11/2011 18:51, Rolf Lear wrote: >> Hi Randall, Michael. >> >> It's an interesting observation... and I can see the implications. I >> would >> like to take a closer look at at, but that may take a little while. >> >> I filed https://github.com/hunterhacker/jdom/issues/52 >> >> 'Off the cuff' I can think of one work-around and a few solutions (in >> addition to what Michael has suggested) >> >> 1. immediately after parsing your real document you then parse a >> dummy/small/inmemory document (even invalid - and catch the exception). >> 2. Currently when you do-no reuse the parser, it goes back to 'first >> principals' and queries JAXP, etc. to find a parser instance Instead it >> could 'cache' the parser 'source' after the first time, and then just >> create a new instance, instead of doing all the class-based lookups... > > Ouch. Creating a new parser to parse a small document is a cost that > it's nice to avoid, but it isn't going to kill you. Going through the > JAXP factory process to get a new ParserFactory is a monstrous cost > that can dominate all other processing - and reusing the factory costs > nothing. > > Michael Kay > Saxonica > Not sure what you are saying... are you agreeing that the 'ouch' problem is the one it has at the moment, or the suggestion to skip the JAXB processing on subsequent non-reuse-parser parses? I have not yet had a close look at the problem... the potential option of not going back to first-principles on subsequent parses may not be (easily) possible.... Unless Randall can convince me otherwise, I'm going to finish working on some StAX outputter code I am embroiled in, and then look at it. Rolf _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From randallt at us.ibm.com Fri Nov 11 04:31:38 2011 From: randallt at us.ibm.com (Randall Theobald) Date: Fri, 11 Nov 2011 06:31:38 -0600 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EBD0551.3020205@tuis.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> Message-ID: I'd vote for a proper fix of properly resetting the XMLReader at the end of the build method. Our product is stuck on 1.0-beta-7 or something which doesn't even have the parser reuse in it yet, so there is no hurry. I just wanted to make sure that the issue got looked at moving forward. Randall Theobald Performance: WebSphere Business Process Management & Connectivity IBM Software Group randallt at us.ibm.com Austin, TX 512-286-8870 t/l: 363-8870 From: Rolf Lear To: Michael Kay , Cc: jdom-interest at jdom.org Date: 11/11/2011 05:31 AM Subject: Re: [jdom-interest] JDOM parser reuse memory problem Sent by: jdom-interest-bounces at jdom.org On 11/11/2011 3:33 AM, Michael Kay wrote: > On 10/11/2011 18:51, Rolf Lear wrote: >> Hi Randall, Michael. >> >> It's an interesting observation... and I can see the implications. I >> would >> like to take a closer look at at, but that may take a little while. >> >> I filed https://github.com/hunterhacker/jdom/issues/52 >> >> 'Off the cuff' I can think of one work-around and a few solutions (in >> addition to what Michael has suggested) >> >> 1. immediately after parsing your real document you then parse a >> dummy/small/inmemory document (even invalid - and catch the exception). >> 2. Currently when you do-no reuse the parser, it goes back to 'first >> principals' and queries JAXP, etc. to find a parser instance Instead it >> could 'cache' the parser 'source' after the first time, and then just >> create a new instance, instead of doing all the class-based lookups... > > Ouch. Creating a new parser to parse a small document is a cost that > it's nice to avoid, but it isn't going to kill you. Going through the > JAXP factory process to get a new ParserFactory is a monstrous cost > that can dominate all other processing - and reusing the factory costs > nothing. > > Michael Kay > Saxonica > Not sure what you are saying... are you agreeing that the 'ouch' problem is the one it has at the moment, or the suggestion to skip the JAXB processing on subsequent non-reuse-parser parses? I have not yet had a close look at the problem... the potential option of not going back to first-principles on subsequent parses may not be (easily) possible.... Unless Randall can convince me otherwise, I'm going to finish working on some StAX outputter code I am embroiled in, and then look at it. Rolf _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Sun Nov 13 18:03:36 2011 From: jdom at tuis.net (Rolf) Date: Sun, 13 Nov 2011 21:03:36 -0500 Subject: [jdom-interest] StAX support Message-ID: <4EC076F8.4020900@tuis.net> Hi All. I have been trying to put StAX support in to JDOM for a little while now, and I have just pushed through the code to github that contains the majority of the anticipated API on the JDOM side for handling the StAX parsing/processing of XML. I have been using as references the StAX specification, the JDOM 'way' of doing things, and the rest of the web. Some observations I have: 1. StAX is currently the fastest way (slightly) to parse XML on my computer. 2. The StAX specification is perhaps the very worst specification I have ever seen for functionality currently in the Java language/API. I hope that other concepts in the JCP process have better results. 3. XML Validation with StAX is 'hard'. 4. DOCTYPE handling in StAX is unpredictable. 5. after having been around for almost as long as JDOM, the StAX concept is still 'dynamic' and changing. Essentially, I have had a long hard look at it, and I think there were a number of oversights in the process.... it's a good concept that has had a poor implementation. On the other hand, I have put a fair amount of thought in to it, and gone a long way to making it work well in JDOM (within the limitations of StAX), and there may be some use in it. My thinking is that I will leave the code in there for the moment, but it is incomplete, and I really need to work on something else in the meantime. It is still a 50/50 as to whether it should be in there, or be stripped out again. What I would really like is to get in touch with a StAX 'expert' and run some of my concerns past them. Is there someone on this list with some StAX insight? Is there a forum anyone knows of that's dedicated to the StAX implementation in Java? Anyway, I would appreciate it if some people with StAX experience played with the code: String filename = "myfile.xml"; StreamSource source = new StreamSource(new File(filename)); XMLInputFactory inputfac = XMLInputFactory.newInstance(); inputfac.setProperty( "http://java.sun.com/xml/stream/properties/report-cdata-event", Boolean.TRUE); XMLStreamReader reader = inputfac.createXMLStreamReader(source); StAXStreamBuilder stxb = new StAXStreamBuilder(); Document staxbuild = stxb.build(reader); Rolf From mike at saxonica.com Mon Nov 14 01:07:45 2011 From: mike at saxonica.com (Michael Kay) Date: Mon, 14 Nov 2011 09:07:45 +0000 Subject: [jdom-interest] StAX support In-Reply-To: <4EC076F8.4020900@tuis.net> References: <4EC076F8.4020900@tuis.net> Message-ID: <4EC0DA61.7050807@saxonica.com> On 14/11/2011 02:03, Rolf wrote: > Hi All. > > I have been trying to put StAX support in to JDOM for a little while > now, and I have just pushed through the code to github that contains > the majority of the anticipated API on the JDOM side for handling the > StAX parsing/processing of XML. To what purpose? If you're building a tree, there are no usability benefits in using a pull parser rather than a push parser. If there are any performance benefits, then they are (a) very small, and (b) accidents of the implementation rather than anything architectural. Architecturally, there are disadvantages because it is harder to insert other functionality (filters, validators etc) into the parsing pipeline. > > I have been using as references the StAX specification, the JDOM 'way' > of doing things, and the rest of the web. > > Some observations I have: > 1. StAX is currently the fastest way (slightly) to parse XML on my > computer. Which parser? Woodstox is fast, but it's also fast in push mode. > 2. The StAX specification is perhaps the very worst specification I > have ever seen for functionality currently in the Java language/API. I > hope that other concepts in the JCP process have better results. Agree 100%. There have been a lot of interoperability issues with StAX parsers as a result. Exception handling is a disaster area. > 3. XML Validation with StAX is 'hard'. Because pull pipelines are more difficult to construct than push pipelines. > 4. DOCTYPE handling in StAX is unpredictable. I'm not sure the "in StAX" is needed in that sentence... > 5. after having been around for almost as long as JDOM, the StAX > concept is still 'dynamic' and changing. Actually I see it as pretty dormant. It's an idea that really hasn't taken on significantly. I've been supporting StAX in Saxon for years and I see very little evidence that anyone uses it. The only parser that reached a decent level of maturity and stability was Woodstox, and that now seems to be stable with little further development. > > Essentially, I have had a long hard look at it, and I think there were > a number of oversights in the process.... it's a good concept that has > had a poor implementation. > > On the other hand, I have put a fair amount of thought in to it, and > gone a long way to making it work well in JDOM (within the limitations > of StAX), and there may be some use in it. > > My thinking is that I will leave the code in there for the moment, but > it is incomplete, and I really need to work on something else in the > meantime. > > It is still a 50/50 as to whether it should be in there, or be > stripped out again. I'd vote against, on balance. It's feature creep - added complexity with very little benefit. (And if someone really needs to get the output of a StAX parser into a JDOM tree, they can always use a Saxon identity transformer with a Stax Source and a JDOM Result.) > > What I would really like is to get in touch with a StAX 'expert' and > run some of my concerns past them. Tatu Saloranta of Woodstox fame is your man. > Regards, Michael Kay Saxonica From elharo at ibiblio.org Mon Nov 14 05:02:32 2011 From: elharo at ibiblio.org (Elliotte Rusty Harold) Date: Mon, 14 Nov 2011 08:02:32 -0500 Subject: [jdom-interest] StAX support In-Reply-To: <4EC076F8.4020900@tuis.net> References: <4EC076F8.4020900@tuis.net> Message-ID: I agree with Michael. There's no gain in supporting StAX. I would invest time or resources into it. -- Elliotte Rusty Harold elharo at ibiblio.org From jdom at tuis.net Mon Nov 14 07:30:30 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 14 Nov 2011 10:30:30 -0500 Subject: [jdom-interest] StAX support In-Reply-To: References: <4EC076F8.4020900@tuis.net> Message-ID: Hi Elliotte I'll reply to you, not Michael because you have expressed the stronger sentiment.... Is your assessment a statement on the future of StAX too? You feature quite significantly as a proponent of StAX so it is somewhat interesting to see your comments. Anyway, I think the 'no gain in supporting StAX' is a little 'harsh', my impression is that it is more of a gray area than your 'absolute' statement... there must be *some* benefit to StAX, JDOM support has been requested for years.... even if only sporadically... If I were to try to summarize my sentiment it would be: Good idea, bad execution, could be fixed. In more detail, I would say that the 'fragment parsing' part of the StAX philosophy is a big advantage. I have spent a lot of time looking at that feature as it is the biggest 'value' that StAX offers, as far as I can tell. I guess the cocnept can be adapted to SAX and DOM parsing though. There is nothing philosophically 'wrong' with pull-parsing, it has as much value as push-parsing (just not as much history), so I don't want to discount it on principal... JDOM is about making XML handling easy, philosophically JDOM should make it easy to work with what you have, and, if that happens to be a StAX source/sink, then it should be easy to feed that in to (out of) JDOM.... Further, support of StAX is entrenched in JAXP, and it seems like it is a 'void' to have no support for it in JDOM. ...is there really nothing good about it? If nothing else, I want to have some sort of 'official' statement on StAX: i.e. I want to be at the point where JDOM2 does one of the following: 1. fully supports StAX 2. supports StAX with documented limitations 3. provides a well documented rationale for *not* supporting StAX Unfortunately (for me) I have already spent a fair amount of time getting my head in to the StAX world, so I now have a bias, but I think that option 2 is reasonable... I don't particularly want to throw out everything I did, but I can't (right now) see a way to make the support complete. Rolf On Mon, 14 Nov 2011 08:02:32 -0500, Elliotte Rusty Harold wrote: > I agree with Michael. There's no gain in supporting StAX. I would > invest time or resources into it. From mike at saxonica.com Mon Nov 14 08:25:51 2011 From: mike at saxonica.com (Michael Kay) Date: Mon, 14 Nov 2011 16:25:51 +0000 Subject: [jdom-interest] StAX support In-Reply-To: References: <4EC076F8.4020900@tuis.net> Message-ID: <4EC1410F.3010708@saxonica.com> >JDOM is about making XML handling easy, philosophically JDOM should make it easy to work with what you have, and, if that happens to be a StAX source/sink, then it should be easy to feed that in to (out of) JDOM.... I think there's a need to identify use cases. If people are starting with lexical XML and want to build a JDOM representation of it, then they don't care what kind of parser is used. If they have some source of StAX events and want to build a JDOM representation of it, then that's a potential use case; but I'm not sure it's a very convincing one because most products/components that want to output XML prefer to push it rather than making it available to be pulled; the reason for that is the same as the justification for StAX in the first place: programmers like to own the main control loop, which means they like to pull their input and push their output. (From that perspective, supporting the StAX push interfaces would actually make more sense than supporting the StAX pull interface, because if you want to output XML, sending it to a StAX XMLStreamWriter is much easier than sending it to a SAX ContentHandler. However, SAX is so entrenched as the canonical push API that XMLStreamWriter isn't going to displace it any time soon.) Another factor here is JDK 1.5. StAX isn't present in JDK 1.5 by default, so any use of it in a product designed to work with JDK 1.5 is going to cause a configuration hassle. Michael Kay Saxonica From jdom at tuis.net Mon Nov 14 11:22:43 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 14 Nov 2011 14:22:43 -0500 Subject: [jdom-interest] StAX support In-Reply-To: <1321297621.47078.YahooMailNeo@web161020.mail.bf1.yahoo.com> References: <4EC076F8.4020900@tuis.net> <1321297621.47078.YahooMailNeo@web161020.mail.bf1.yahoo.com> Message-ID: <0004bc9b9513acc21356adfa17d1094b@tuis.net> hi Tatu You did contribute (although it is not in 'contrib'). It's referenced on the 'issue' https://github.com/hunterhacker/jdom/issues/7 I have taken a good look at it, and I decided that I could not use it 'as-is', for the primary reason that it performs too much 'filtering' of content on the input side. Also, it does not use the 'normal' JDOM ways for input and output. I used it as a reference for a bunch of things, but, I have gone much, much further, implementing fragment parsing, and working on Event as well as Stream readers/writers. https://github.com/hunterhacker/jdom/commit/59ee0d5384843d841c9a1f5fe8dd3b8fda2c8524 Since the above commit, it has changed a lot more.... But, the notes on that commit basically identify why I re-did it instead of using your contribution as-is. Also, I am regretting the DTD stuff, which I am finding to be the biggest issue with StAX (as I am sure you have found too). Rolf On Mon, 14 Nov 2011 11:07:01 -0800 (PST), Tatu Saloranta wrote: > Hmmh. I thought I had contributed Stax builders way back when... what > happened to those pieces? > > For what it's worth, code is/was available at: > > http://woodstox.codehaus.org/StaxMisc > > and I thought ended up on jdom sandbox/contrib section. > > -+ Tatu +- > > > > ________________________________ > From: Rolf > To: jdom > Sent: Sunday, November 13, 2011 6:03 PM > Subject: [jdom-interest] StAX support > > Hi All. > > I have been trying to put StAX support in to JDOM for a little while now, > and I have just pushed through the code to github that contains the > majority of the anticipated API on the JDOM side for handling the StAX > parsing/processing of XML. > > I have been using as references the StAX specification, the JDOM 'way' of > doing things, and the rest of the web. > > Some observations I have: > 1. StAX is currently the fastest way (slightly) to parse XML on my > computer. > 2. The StAX specification is perhaps the very worst specification I have > ever seen for functionality currently in the Java language/API. I hope that > other concepts in the JCP process have better results. > 3. XML Validation with StAX is 'hard'. > 4. DOCTYPE handling in StAX is unpredictable. > 5. after having been around for almost as long as JDOM, the StAX concept > is still 'dynamic' and changing. > > Essentially, I have had a long hard look at it, and I think there were a > number of oversights in the process.... it's a good concept that has had a > poor implementation. > > On the other hand, I have put a fair amount of thought in to it, and gone > a long way to making it work well in JDOM (within the limitations of StAX), > and there may be some use in it. > > My thinking is that I will leave the code in there for the moment, but it > is incomplete, and I really need to work on something else in the meantime. > > It is still a 50/50 as to whether it should be in there, or be stripped > out again. > > What I would really like is to get in touch with a StAX 'expert' and run > some of my concerns past them. > > Is there someone on this list with some StAX insight? > Is there a forum anyone knows of that's dedicated to the StAX > implementation in Java? > > Anyway, I would appreciate it if some people with StAX experience played > with the code: > > String filename = "myfile.xml"; > StreamSource source = new StreamSource(new File(filename)); > XMLInputFactory inputfac = XMLInputFactory.newInstance(); > inputfac.setProperty( > ? ? "http://java.sun.com/xml/stream/properties/report-cdata-event", > ? ? Boolean.TRUE); > XMLStreamReader reader = inputfac.createXMLStreamReader(source); > > StAXStreamBuilder stxb = new StAXStreamBuilder(); > Document staxbuild = stxb.build(reader); > > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From cowtowncoder at yahoo.com Mon Nov 14 11:41:39 2011 From: cowtowncoder at yahoo.com (Tatu Saloranta) Date: Mon, 14 Nov 2011 11:41:39 -0800 (PST) Subject: [jdom-interest] StAX support In-Reply-To: <0004bc9b9513acc21356adfa17d1094b@tuis.net> References: <4EC076F8.4020900@tuis.net> <1321297621.47078.YahooMailNeo@web161020.mail.bf1.yahoo.com> <0004bc9b9513acc21356adfa17d1094b@tuis.net> Message-ID: <1321299699.26611.YahooMailNeo@web161004.mail.bf1.yahoo.com> Ok, good -- I have nothing against improved version(s), just hoped existing one might give some inspiration. And sounds like it did. DTD access (or lack thereof) is unfortunate indeed. Thank you for explanation & good luck with further improvements, sounds like you have been improving jdom quite a bit. -+ Tatu +- ----- Original Message ----- From: Rolf Lear To: Tatu Saloranta Cc: jdom interest Sent: Monday, November 14, 2011 11:22 AM Subject: Re: [jdom-interest] StAX support hi Tatu You did contribute (although it is not in 'contrib'). It's referenced on the 'issue' https://github.com/hunterhacker/jdom/issues/7 I have taken a good look at it, and I decided that I could not use it 'as-is', for the primary reason that it performs too much 'filtering' of content on the input side. Also, it does not use the 'normal' JDOM ways for input and output. I used it as a reference for a bunch of things, but, I have gone much, much further, implementing fragment parsing, and working on Event as well as Stream readers/writers. https://github.com/hunterhacker/jdom/commit/59ee0d5384843d841c9a1f5fe8dd3b8fda2c8524 Since the above commit, it has changed a lot more.... But, the notes on that commit basically identify why I re-did it instead of using your contribution as-is. Also, I am regretting the DTD stuff, which I am finding to be the biggest issue with StAX (as I am sure you have found too). Rolf On Mon, 14 Nov 2011 11:07:01 -0800 (PST), Tatu Saloranta wrote: > Hmmh. I thought I had contributed Stax builders way back when... what > happened to those pieces? > > For what it's worth, code is/was available at: > > http://woodstox.codehaus.org/StaxMisc > > and I thought ended up on jdom sandbox/contrib section. > > -+ Tatu +- > > > > ________________________________ > From: Rolf > To: jdom > Sent: Sunday, November 13, 2011 6:03 PM > Subject: [jdom-interest] StAX support > > Hi All. > > I have been trying to put StAX support in to JDOM for a little while now, > and I have just pushed through the code to github that contains the > majority of the anticipated API on the JDOM side for handling the StAX > parsing/processing of XML. > > I have been using as references the StAX specification, the JDOM 'way' of > doing things, and the rest of the web. > > Some observations I have: > 1. StAX is currently the fastest way (slightly) to parse XML on my > computer. > 2. The StAX specification is perhaps the very worst specification I have > ever seen for functionality currently in the Java language/API. I hope that > other concepts in the JCP process have better results. > 3. XML Validation with StAX is 'hard'. > 4. DOCTYPE handling in StAX is unpredictable. > 5. after having been around for almost as long as JDOM, the StAX concept > is still 'dynamic' and changing. > > Essentially, I have had a long hard look at it, and I think there were a > number of oversights in the process.... it's a good concept that has had a > poor implementation. > > On the other hand, I have put a fair amount of thought in to it, and gone > a long way to making it work well in JDOM (within the limitations of StAX), > and there may be some use in it. > > My thinking is that I will leave the code in there for the moment, but it > is incomplete, and I really need to work on something else in the meantime. > > It is still a 50/50 as to whether it should be in there, or be stripped > out again. > > What I would really like is to get in touch with a StAX 'expert' and run > some of my concerns past them. > > Is there someone on this list with some StAX insight? > Is there a forum anyone knows of that's dedicated to the StAX > implementation in Java? > > Anyway, I would appreciate it if some people with StAX experience played > with the code: > > String filename = "myfile.xml"; > StreamSource source = new StreamSource(new File(filename)); > XMLInputFactory inputfac = XMLInputFactory.newInstance(); > inputfac.setProperty( > ? ? "http://java.sun.com/xml/stream/properties/report-cdata-event", > ? ? Boolean.TRUE); > XMLStreamReader reader = inputfac.createXMLStreamReader(source); > > StAXStreamBuilder stxb = new StAXStreamBuilder(); > Document staxbuild = stxb.build(reader); > > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Mon Nov 14 11:56:43 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 14 Nov 2011 14:56:43 -0500 Subject: [jdom-interest] StAX support In-Reply-To: <1321297752.26405.YahooMailNeo@web161019.mail.bf1.yahoo.com> References: <4EC076F8.4020900@tuis.net> <1321297752.26405.YahooMailNeo@web161019.mail.bf1.yahoo.com> Message-ID: <32132bf778f11cab23922ec3b17d6651@tuis.net> hi Tatu In my mind I think it would be reasonable to support StAX source/sink in JDOM with the following conditions: Input: - JDOM will ignore DTD events unless explicitly configured to receive them. If they are expected, they must be a full DTD "" not just an 'internal subset' or other invalid value. (this eliminates woodstox as a 'supported' parser I think as it only provides the internal subset), and the internal Java6 implementation creates a partial/truncated doctype "" - JDOM content can be output as fragments to partially-written XML*Writers on the condition that the writer is at an appropriate state before the JDOM write happens. I think the behaviour of the various StAX parsers/libraries is consistent enough to provide a reasonable base for the above restrictions.... Any 'expert' observations/criticisms/suggestions? Rolf On Mon, 14 Nov 2011 11:09:12 -0800 (PST), Tatu Saloranta wrote: > Just because you do not see value does not mean there is no value. > > Number one reason for adding support is interoperability, meaning that > there is often need to convert between streaming/incremental APIs and tree > model. > > -+ Tatu +- > > > > ________________________________ > From: Elliotte Rusty Harold > To: Rolf > Cc: jdom > Sent: Monday, November 14, 2011 5:02 AM > Subject: Re: [jdom-interest] StAX support > > I agree with Michael. There's no gain in supporting StAX. I would > invest time or resources into it. From jdom at tuis.net Mon Nov 14 17:29:00 2011 From: jdom at tuis.net (Rolf) Date: Mon, 14 Nov 2011 20:29:00 -0500 Subject: [jdom-interest] End-of-line sequence. Message-ID: <4EC1C05C.7010201@tuis.net> Hi all. JDOM has been merrily using "\r\n" as an end-of-line sequence in the XMLOutputer since 'forever'. The XML Spec indicates that all end-of-line sequences should be normalized to a single '\n': http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that XML parsers should clear out any extra '\r' characters if there are any, so it is not as if the code is completely broken. But, I think it makes sense to follow the spec, and avoid having different XML compared to other systems. I propose changing the line separator to follow the spec, but this has a very large impact on anyone who has expectations on JDOM having a particular line-terminator, even though they shouldn't... I have filed https://github.com/hunterhacker/jdom/issues/53 The original decision was made by Elliotte: http://markmail.org/message/gv7m3xjgrkomrfe7 (it's worth noting that it was changed from the 'platform default' to the constant '\r\n' to create some consistency too). vvv quote vvv The one open question in this version is what to use for a line separator. Right now I'm using \r\n since that's most cross-platform compatible and friendliest to various network protocols. However, \n alone might be slightly friendlier to XML parsers. Another possibility is to ask for System.getProperty("line.separator"). However, I'm loathe to make the output platform dependent. What do people think? ^^^ quote ^^^ Also, the commit introducing this has interesting comments: https://github.com/hunterhacker/jdom/commit/958fb22a4c7088b82f0d48a933bdf4e5c6806151#L0R173 Two issues I see: 1. "\r\n" was chosen for 'Network protocol' friendliness... is this still a valid argument? 2. is it OK to change the standard format of all the XML that JDOM produces? (I have been really careful (so far) for the most part to ensure all whitespace (including indents and EOL/EOF is not changed) ). I see changing the default EOL as being an easy decision, especially since users can still change it back easily on their Format instance. advantages: 1. Most XML tools do not use "\r" values - better compatibility? 2. XML output will be slightly smaller - ;-) 3. XML produced by 'other' outputters (currently the StAX outputters) can be compared directly with XMLOutputter for testing/compatibility disadvantages: 1. people may have 'baselines' that contain \r\n terminators, which will then be different from JDOM's default output. 2. there may be some (obscure) protocols that require \r\n terminators and users of JDOM2 will have to override the EOL to be '\r\n' for those. Anyone have comments/suggestions? Rolf From elharo at ibiblio.org Mon Nov 14 17:57:41 2011 From: elharo at ibiblio.org (Elliotte Rusty Harold) Date: Mon, 14 Nov 2011 20:57:41 -0500 Subject: [jdom-interest] End-of-line sequence. In-Reply-To: <4EC1C05C.7010201@tuis.net> References: <4EC1C05C.7010201@tuis.net> Message-ID: On Mon, Nov 14, 2011 at 8:29 PM, Rolf wrote: > Hi all. > > JDOM has been merrily using "\r\n" as an end-of-line sequence in the > XMLOutputer since 'forever'. The XML Spec indicates that all end-of-line > sequences should be normalized to a single '\n': > http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that XML > parsers should clear out any extra '\r' characters if there are any, so it > is not as if the code is completely broken. > > But, I think it makes sense to follow the spec, and avoid having different > XML compared to other systems. > > I propose changing the line separator to follow the spec, but this has a > very large impact on anyone who has expectations on JDOM having a particular > line-terminator, even though they shouldn't... You're confusing input with output. \r\n is fully compliant with the XML specification. Line separators can be chosen or changed already by supplying an appropriately configured Format. -- Elliotte Rusty Harold elharo at ibiblio.org From jhunter at servlets.com Mon Nov 14 18:02:17 2011 From: jhunter at servlets.com (Jason Hunter) Date: Mon, 14 Nov 2011 18:02:17 -0800 Subject: [jdom-interest] End-of-line sequence. In-Reply-To: <4EC1C05C.7010201@tuis.net> References: <4EC1C05C.7010201@tuis.net> Message-ID: <53F8C02A-915F-4FF1-9852-4B54113175C3@servlets.com> I can see this causing people some random, hard-to-figure-out pain. I'd never want to do a change like this on the 1.x branch. But on the 2.x branch? It's a possibility. -jh- On Nov 14, 2011, at 5:29 PM, Rolf wrote: > Hi all. > > JDOM has been merrily using "\r\n" as an end-of-line sequence in the XMLOutputer since 'forever'. The XML Spec indicates that all end-of-line sequences should be normalized to a single '\n': http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that XML parsers should clear out any extra '\r' characters if there are any, so it is not as if the code is completely broken. > > But, I think it makes sense to follow the spec, and avoid having different XML compared to other systems. > > I propose changing the line separator to follow the spec, but this has a very large impact on anyone who has expectations on JDOM having a particular line-terminator, even though they shouldn't... > > I have filed https://github.com/hunterhacker/jdom/issues/53 > > The original decision was made by Elliotte: http://markmail.org/message/gv7m3xjgrkomrfe7 (it's worth noting that it was changed from the 'platform default' to the constant '\r\n' to create some consistency too). > > vvv quote vvv > > The one open question in this version is what to use for a line separator. Right now I'm using \r\n since that's most cross-platform compatible and friendliest to various network protocols. However, \n alone might be slightly friendlier to XML parsers. Another possibility is to ask for System.getProperty("line.separator"). However, I'm loathe to make the output platform dependent. What do people think? > > ^^^ quote ^^^ > > Also, the commit introducing this has interesting comments: https://github.com/hunterhacker/jdom/commit/958fb22a4c7088b82f0d48a933bdf4e5c6806151#L0R173 > > Two issues I see: > 1. "\r\n" was chosen for 'Network protocol' friendliness... is this still a valid argument? > > 2. is it OK to change the standard format of all the XML that JDOM produces? (I have been really careful (so far) for the most part to ensure all whitespace (including indents and EOL/EOF is not changed) ). > > I see changing the default EOL as being an easy decision, especially since users can still change it back easily on their Format instance. > > advantages: > 1. Most XML tools do not use "\r" values - better compatibility? > 2. XML output will be slightly smaller - ;-) > 3. XML produced by 'other' outputters (currently the StAX outputters) can be compared directly with XMLOutputter for testing/compatibility > > disadvantages: > 1. people may have 'baselines' that contain \r\n terminators, which will then be different from JDOM's default output. > 2. there may be some (obscure) protocols that require \r\n terminators and users of JDOM2 will have to override the EOL to be '\r\n' for those. > > Anyone have comments/suggestions? > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Mon Nov 14 18:04:55 2011 From: jdom at tuis.net (Rolf) Date: Mon, 14 Nov 2011 21:04:55 -0500 Subject: [jdom-interest] End-of-line sequence. In-Reply-To: References: <4EC1C05C.7010201@tuis.net> Message-ID: <4EC1C8C7.1020604@tuis.net> On 14/11/2011 8:57 PM, Elliotte Rusty Harold wrote: > On Mon, Nov 14, 2011 at 8:29 PM, Rolf wrote: >> Hi all. >> >> JDOM has been merrily using "\r\n" as an end-of-line sequence in the >> XMLOutputer since 'forever'. The XML Spec indicates that all end-of-line >> sequences should be normalized to a single '\n': >> http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that XML >> parsers should clear out any extra '\r' characters if there are any, so it >> is not as if the code is completely broken. >> >> But, I think it makes sense to follow the spec, and avoid having different >> XML compared to other systems. >> >> I propose changing the line separator to follow the spec, but this has a >> very large impact on anyone who has expectations on JDOM having a particular >> line-terminator, even though they shouldn't... > > You're confusing input with output. \r\n is fully compliant with the > XML specification. I'm not really confusing input/output, the assumption is the XMLOutput from JDOM will be input for some parser somewhere.... so, JDOM output is input for something. Perhaps I should have indicated that what I meant by 'the spec' is to make JDOM XMLOutput 'by default' look more like what it does inside the JDOM memory model. As for the specification, yes, \r\n is compliant with parser input, but, my point is two-fold: 1. that it is 'guaranteed' to be different to the 'XML infoset' after parsing 2. 'everyone else' does it as a simple '\n', so why are we 'stuck' on \r\n? > > Line separators can be chosen or changed already by supplying an > appropriately configured Format. > This is also my point exactly... so, we set the default to '\n', then the people who 'need' \r\n can set it easily. The only issue as I see it is: is there a compelling reason for '\r\n' that I can't see? What are these 'network protocols' that need \r\n? Rolf From jdom at tuis.net Mon Nov 14 18:22:55 2011 From: jdom at tuis.net (Rolf) Date: Mon, 14 Nov 2011 21:22:55 -0500 Subject: [jdom-interest] End-of-line sequence. In-Reply-To: <53F8C02A-915F-4FF1-9852-4B54113175C3@servlets.com> References: <4EC1C05C.7010201@tuis.net> <53F8C02A-915F-4FF1-9852-4B54113175C3@servlets.com> Message-ID: <4EC1CCFF.1040201@tuis.net> I was anticipating the JDOM2 branch, yes. The 'significance' on JDOM2 is that I am comparing output with other 'standard' tools (xmllint, DOM, etc), and inspecting the differences, and this is coming up as one. I currently view it as a low-risk easy-win concept.... other than people who have baseline/regression tests with a particular format (and for the moment, I think it is only JDOM's regression/junit tests that expect the EOL sequence to be any particular value...). I have already broken that (in JDOM2) though for people using other 'pretty' formats because JDOM was issuing double-newline-sequences at the end-of-file, and it now only issues one. As for the random&hard-to-figure out pain, I am not sure.... it all depends on how you look at it: people who 'depend' on the \r\n sequence are just as likely to have those sorts of issues regardless of the JDOM setting.... There is *nothing* that should depend on the EOL sequence, thus, it *should* be safe to change.... ... and again, it comes down to 'why is \r\n better than \n'? I can think of reasons why \n is better than \r\n, but not the other way around.... Rolf On 14/11/2011 9:02 PM, Jason Hunter wrote: > I can see this causing people some random, hard-to-figure-out pain. I'd never want to do a change like this on the 1.x branch. But on the 2.x branch? It's a possibility. > > -jh- > > On Nov 14, 2011, at 5:29 PM, Rolf wrote: > >> Hi all. >> >> JDOM has been merrily using "\r\n" as an end-of-line sequence in the XMLOutputer since 'forever'. The XML Spec indicates that all end-of-line sequences should be normalized to a single '\n': http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that XML parsers should clear out any extra '\r' characters if there are any, so it is not as if the code is completely broken. >> >> But, I think it makes sense to follow the spec, and avoid having different XML compared to other systems. >> >> I propose changing the line separator to follow the spec, but this has a very large impact on anyone who has expectations on JDOM having a particular line-terminator, even though they shouldn't... >> >> I have filed https://github.com/hunterhacker/jdom/issues/53 >> >> The original decision was made by Elliotte: http://markmail.org/message/gv7m3xjgrkomrfe7 (it's worth noting that it was changed from the 'platform default' to the constant '\r\n' to create some consistency too). >> >> vvv quote vvv >> >> The one open question in this version is what to use for a line separator. Right now I'm using \r\n since that's most cross-platform compatible and friendliest to various network protocols. However, \n alone might be slightly friendlier to XML parsers. Another possibility is to ask for System.getProperty("line.separator"). However, I'm loathe to make the output platform dependent. What do people think? >> >> ^^^ quote ^^^ >> >> Also, the commit introducing this has interesting comments: https://github.com/hunterhacker/jdom/commit/958fb22a4c7088b82f0d48a933bdf4e5c6806151#L0R173 >> >> Two issues I see: >> 1. "\r\n" was chosen for 'Network protocol' friendliness... is this still a valid argument? >> >> 2. is it OK to change the standard format of all the XML that JDOM produces? (I have been really careful (so far) for the most part to ensure all whitespace (including indents and EOL/EOF is not changed) ). >> >> I see changing the default EOL as being an easy decision, especially since users can still change it back easily on their Format instance. >> >> advantages: >> 1. Most XML tools do not use "\r" values - better compatibility? >> 2. XML output will be slightly smaller - ;-) >> 3. XML produced by 'other' outputters (currently the StAX outputters) can be compared directly with XMLOutputter for testing/compatibility >> >> disadvantages: >> 1. people may have 'baselines' that contain \r\n terminators, which will then be different from JDOM's default output. >> 2. there may be some (obscure) protocols that require \r\n terminators and users of JDOM2 will have to override the EOL to be '\r\n' for those. >> >> Anyone have comments/suggestions? >> >> Rolf >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From mike at saxonica.com Tue Nov 15 01:17:28 2011 From: mike at saxonica.com (Michael Kay) Date: Tue, 15 Nov 2011 09:17:28 +0000 Subject: [jdom-interest] End-of-line sequence. In-Reply-To: <4EC1C05C.7010201@tuis.net> References: <4EC1C05C.7010201@tuis.net> Message-ID: <4EC22E28.8090706@saxonica.com> The only real case I can see for using \r\n is that if you use \n, you can't view the output in Notepad (and presumably a few other diehard Windows applications). Making a change will break a few people's unit tests, but no real applications should be dependent on the precise bitstream. Using \r\n does seem a bit quirky, but I can't see that it does any harm. -0, if that's an allowed vote. Michael Kay Saxonica On 15/11/2011 01:29, Rolf wrote: > Hi all. > > JDOM has been merrily using "\r\n" as an end-of-line sequence in the > XMLOutputer since 'forever'. The XML Spec indicates that all > end-of-line sequences should be normalized to a single '\n': > http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that > XML parsers should clear out any extra '\r' characters if there are > any, so it is not as if the code is completely broken. > > But, I think it makes sense to follow the spec, and avoid having > different XML compared to other systems. > > I propose changing the line separator to follow the spec, but this has > a very large impact on anyone who has expectations on JDOM having a > particular line-terminator, even though they shouldn't... > > I have filed https://github.com/hunterhacker/jdom/issues/53 > > The original decision was made by Elliotte: > http://markmail.org/message/gv7m3xjgrkomrfe7 (it's worth noting that > it was changed from the 'platform default' to the constant '\r\n' to > create some consistency too). > > vvv quote vvv > > The one open question in this version is what to use for a line > separator. Right now I'm using \r\n since that's most cross-platform > compatible and friendliest to various network protocols. However, \n > alone might be slightly friendlier to XML parsers. Another possibility > is to ask for System.getProperty("line.separator"). However, I'm > loathe to make the output platform dependent. What do people think? > > ^^^ quote ^^^ > > Also, the commit introducing this has interesting comments: > https://github.com/hunterhacker/jdom/commit/958fb22a4c7088b82f0d48a933bdf4e5c6806151#L0R173 > > Two issues I see: > 1. "\r\n" was chosen for 'Network protocol' friendliness... is this > still a valid argument? > > 2. is it OK to change the standard format of all the XML that JDOM > produces? (I have been really careful (so far) for the most part to > ensure all whitespace (including indents and EOL/EOF is not changed) ). > > I see changing the default EOL as being an easy decision, especially > since users can still change it back easily on their Format instance. > > advantages: > 1. Most XML tools do not use "\r" values - better compatibility? > 2. XML output will be slightly smaller - ;-) > 3. XML produced by 'other' outputters (currently the StAX outputters) > can be compared directly with XMLOutputter for testing/compatibility > > disadvantages: > 1. people may have 'baselines' that contain \r\n terminators, which > will then be different from JDOM's default output. > 2. there may be some (obscure) protocols that require \r\n terminators > and users of JDOM2 will have to override the EOL to be '\r\n' for those. > > Anyone have comments/suggestions? > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Tue Nov 15 20:53:26 2011 From: jdom at tuis.net (Rolf) Date: Tue, 15 Nov 2011 23:53:26 -0500 Subject: [jdom-interest] StAX support In-Reply-To: <1321383864.80296.YahooMailNeo@web161019.mail.bf1.yahoo.com> References: <4EC076F8.4020900@tuis.net> <1321297752.26405.YahooMailNeo@web161019.mail.bf1.yahoo.com> <32132bf778f11cab23922ec3b17d6651@tuis.net> <1321383864.80296.YahooMailNeo@web161019.mail.bf1.yahoo.com> Message-ID: <4EC341C6.1060009@tuis.net> Thanks for the feedback. I will further investigate the DTD issues, but it is good to hear that you think this is reasonable. I think it makes sense to make it as woodstox friendly as possible, so I will make the effort on it. Right now the implementation is relatively complete, with a reasonable set of JUnit tests. It is nice that I have been able to get almost identical test coverage for the StAX and XML outputters. This makes it feel stable even though it is really new. Anyway, thanks for your input. Rolf On 15/11/2011 2:04 PM, Tatu Saloranta wrote: > (apologies for messed formatting -- yahoo mail editor is bit odd) > > > ----- Original Message ----- > >> hi Tatu > >> In my mind I think it would be reasonable to support StAX source/sink in JDOM with the following conditions: > >> Input: >> - JDOM will ignore DTD events unless explicitly configured to receive them. If they are expected, they must be a full DTD "" not just an 'internal subset' or other invalid value. (this eliminates woodstox as a 'supported' parser I think as it only provides the internal subset), and the internal Java6 implementation creates a partial/truncated doctype" > - JDOM will treat SPACE and CHARACTERS events identically > - the XML*Reader must be configured to provide CDATA events, otherwise > JDOM will never know. > - JDOM can process JDOM fragments from partially processed XML*Readers as > long as they are on a logical event (i.e. excluding END_ELEMENT, > END_DOCUMENT, and the likes). > > Output: > - JDOM can output all of it's content types, but special handling for > EntityRef is required (not sure of all the details yet). > - DocType content will be output as a single String " SYSTEM .... [ ... ]>" > - JDOM content can be output as fragments to partially-written XML*Writers > on the condition that the writer is at an appropriate state before the JDOM > write happens. > > > I think the behaviour of the various StAX parsers/libraries is consistent > enough to provide a reasonable base for the above restrictions.... > > Any 'expert' observations/criticisms/suggestions? > > Rolf > ----- > > > Looks reasonable to me. The only question I have is wrt DTD. Javadocs for XMLStreamReader.getText() state: > > "Returns the current value of the parse event as a string, > this returns the string value of a CHARACTERS event, > returns the value of a COMMENT, the replacement value > for an ENTITY_REFERENCE, the string value of a CDATA section, > the string value for a SPACE event, > or the String value of the internal subset of the DTD." > > which is why Woodstox returns the internal DTD subset, as per specification. But as Stax 1.0 DTD handling is crippled basically, I don't really care deeply either way -- I did specify Stax2 extension API (see http://woodstox.codehaus.org/4.1.0/javadoc/index.html under 'Stax2'), which is implemented by Woodstox and Aalto, and it patches all issues I found with the 'vanilla' Stax API. > > I think it is a good idea to support fragment handling, and fine to just drop CHARACTERS/SPACE distinction. > > -+ Tatu +- > From jdom at tuis.net Tue Nov 15 20:57:54 2011 From: jdom at tuis.net (Rolf) Date: Tue, 15 Nov 2011 23:57:54 -0500 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <716973aed0ca67fc72c533a66ebaf276@tuis.net> References: <4EAE005C.6020608@tuis.net> <716973aed0ca67fc72c533a66ebaf276@tuis.net> Message-ID: <4EC342D2.6070401@tuis.net> November 18th is coming up pretty soon. Please speak up if you have any comments, suggestions, or concerns. Thanks Rolf On 31/10/2011 9:26 AM, Rolf Lear wrote: > > I should add a time-line here. > > I think I will sit on this for a couple of weeks... Say Friday the 18th - > three weeks. > > At that point I will summarize all the responses... and between now and > then I will also see if I can come up with a more detailed list of what the > implications for supporting Java5 are... > > Then we can make a more informed decision. > > A third option would be to only officially support Java6, but also put > together a document on how to make it work with Java5. > > Rolf > > On Sun, 30 Oct 2011 21:56:44 -0400, Rolf wrote: >> Hi all. >> >> ... >> >> So, as a poll: >> >> * Does anyone have a realistic need to run a future JDOM2 on Java5? >> * If so, could you add additional jars to your classpath just to make >> JDOM2 work? >> * Any comments, suggestions. >> >> Currently I feel that it is reasonable to set Java6 as a minumum and not > >> even bother trying to think about Java5 issues... anyone disagree? >> >> Rolf >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Tue Nov 15 21:07:32 2011 From: jdom at tuis.net (Rolf) Date: Wed, 16 Nov 2011 00:07:32 -0500 Subject: [jdom-interest] Release jdom-2.x-2011.11.15.23.28.zip Message-ID: <4EC34514.5090806@tuis.net> Hi all. I have done some 'administration' and there are some updates. In summary though: 1. I have put together a JDOM2 Features page: https://github.com/hunterhacker/jdom/wiki/JDOM2-Features 2. I have released a snapshot of JDOM2 with the features as discussed above: https://github.com/downloads/hunterhacker/jdom/jdom-2.x-2011.11.15.23.28.zip 3. I have settled on http://hunterhacker.github.com/jdom/jdom2/ as a good location/entry point for JDOM2 information. Please give the code a whirl. Check out the JavaDOC, JUnit tests, and Coverage links: http://hunterhacker.github.com/jdom/jdom2/apidocs/index.html http://hunterhacker.github.com/jdom/jdom2/junit/index.html http://hunterhacker.github.com/jdom/jdom2/coverage/index.html Thanks Rolf From olivier.jaquemet at jalios.com Wed Nov 16 05:47:34 2011 From: olivier.jaquemet at jalios.com (Olivier Jaquemet) Date: Wed, 16 Nov 2011 14:47:34 +0100 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <4EC342D2.6070401@tuis.net> References: <4EAE005C.6020608@tuis.net> <716973aed0ca67fc72c533a66ebaf276@tuis.net> <4EC342D2.6070401@tuis.net> Message-ID: <4EC3BEF6.5010101@jalios.com> Hello Rolf, As far as we are concerned in my company, we are still evaluating which Java version to support in our next product release. Some of our clients are still stuck on older appserver release which still requires Java 5 (eg WebSphere 6.1). But I would completely understand JDOM 2 to require Java 6, and this would simply be another good reason to provide support only starting from Java 6. Thank you for asking. Regards, Olivier On 16/11/2011 05:57, Rolf wrote: > November 18th is coming up pretty soon. > > Please speak up if you have any comments, suggestions, or concerns. > > Thanks > > Rolf > > On 31/10/2011 9:26 AM, Rolf Lear wrote: >> >> I should add a time-line here. >> >> I think I will sit on this for a couple of weeks... Say Friday the >> 18th - >> three weeks. >> >> At that point I will summarize all the responses... and between now and >> then I will also see if I can come up with a more detailed list of >> what the >> implications for supporting Java5 are... >> >> Then we can make a more informed decision. >> >> A third option would be to only officially support Java6, but also put >> together a document on how to make it work with Java5. >> >> Rolf >> >> On Sun, 30 Oct 2011 21:56:44 -0400, Rolf wrote: >>> Hi all. >>> >>> ... >>> >>> So, as a poll: >>> >>> * Does anyone have a realistic need to run a future JDOM2 on Java5? >>> * If so, could you add additional jars to your classpath just to make >>> JDOM2 work? >>> * Any comments, suggestions. >>> >>> Currently I feel that it is reasonable to set Java6 as a minumum and >>> not >> >>> even bother trying to think about Java5 issues... anyone disagree? >>> >>> Rolf >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Wed Nov 16 22:39:46 2011 From: jdom at tuis.net (Rolf) Date: Thu, 17 Nov 2011 01:39:46 -0500 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <4EC342D2.6070401@tuis.net> References: <4EAE005C.6020608@tuis.net> <716973aed0ca67fc72c533a66ebaf276@tuis.net> <4EC342D2.6070401@tuis.net> Message-ID: <4EC4AC32.4090006@tuis.net> I have been looking in to the implications of supporting Java5. Here is a list of changes I have had to make to get the support in: 1. DescendantIterator uses ArrayDeque - easy fix. 2. Lots of places use Arrays.copyOf(...) - created a new ArrayCopy utility class - an OK fix. 3. XMLConstants.W3C_XML_SCHEMA_NS_URI - does not exist in Java5 - easy fix... hard-code it 4. XMLConstants.FEATURE_SECURE_PROCESSING missing StAX has proven to be the real problem. Specifically, the stand-alone (pre Java6) StAX library is only specified to have optional support for the method I use to load up files. It's not a train-smash, there's an alternative way.... I just have to change all the JUnit tests from loading from a 'Source' to loading from a FileReader I think, all being said and done, that the code will work in Java5. The option of supporting Java6 officially, but having good instructions for making everything work in Java5 is realistic. Currently the instructions would be something like: 1. everything except StAX will work just fine. ... If you want StAX, it comes in two parts, the official API, and an implementation. The API is available in two places, either the official JSR at http://sjsxp.java.net/#downloads or alternatively the xml-apis.jar which is part of apache (and is part of the JDOM2 repository) The reference implementation of StAX is available from http://sjsxp.java.net/#downloads as well. You have to download and run a single .class file SJSXP.class. Alternatively, download the woodstox StAX implementation. Conclusion, it all seems to be quite reasonable to make Java5 work. I still am reluctant to make it officially supported. I think though with some disciplined development the compatibility can be established, and maintained. I don't particularly like having to re-create the java.util.Arrays functionality, but its not a deal-breaker. I think I will commit the code changes though, even if it is just to get them 'on record' (and clear them out of my development environment so I can do other things). Committing the code change is not intended to be an endorsement of Java5 support though! It is a big commit. I think that's enough investigation to make a more informed decision about Java5 support. Currently using Java5 about 250 test cases are failing, but those I believe will pass again if/when I change the tests to use a FileReader for StAX (instead of the unsupported Source). Rolf On 15/11/2011 11:57 PM, Rolf wrote: > November 18th is coming up pretty soon. > > Please speak up if you have any comments, suggestions, or concerns. > > Thanks > > Rolf > > On 31/10/2011 9:26 AM, Rolf Lear wrote: >> >> I should add a time-line here. >> >> I think I will sit on this for a couple of weeks... Say Friday the 18th - >> three weeks. >> >> At that point I will summarize all the responses... and between now and >> then I will also see if I can come up with a more detailed list of >> what the >> implications for supporting Java5 are... >> >> Then we can make a more informed decision. >> >> A third option would be to only officially support Java6, but also put >> together a document on how to make it work with Java5. >> >> Rolf >> >> On Sun, 30 Oct 2011 21:56:44 -0400, Rolf wrote: >>> Hi all. >>> >>> ... >>> >>> So, as a poll: >>> >>> * Does anyone have a realistic need to run a future JDOM2 on Java5? >>> * If so, could you add additional jars to your classpath just to make >>> JDOM2 work? >>> * Any comments, suggestions. >>> >>> Currently I feel that it is reasonable to set Java6 as a minumum and not >> >>> even bother trying to think about Java5 issues... anyone disagree? >>> >>> Rolf >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Thu Nov 17 08:01:59 2011 From: jdom at tuis.net (Rolf Lear) Date: Thu, 17 Nov 2011 11:01:59 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> Message-ID: <4EC52FF7.5030509@tuis.net> Randall, that depends on a few things.... 1. I am looking at the source code, and I could have interpreted it wrong, but I don't think so 2. whether you are reusing the SAX Parser instance - setReuseParser(true) 3. whether I understand your question right.... Firstly, there's two ways to read your question: does the parsed Document refer back to the SAXBuilder somehow; or does the SAXBuilder have references somehow to the parsed Document Answering the first mechanism first.... In a normal 'build', the code creates a SAX Parser/XMLReader instance, and a SAX ContentHandler to handle the SAX events. The ContentHandler created is an instance of SAXHandler, and that contains references to the Document that is parsed. When the build is completed, the Document instance is retrieved from the SAXHandler, and returned to the caller (you). The SAXHandler and the XMLReader are then de-referenced and can be garbage-collected. In this normal case the answer would be 'there is no reference from the SAXBuildert to the Document'. If however you configure the SAXBuilder to reuse the SAX Parser/XMLReader though, then you run in to the bug you first alerted us to... At the end of the build process the SAXParser does not de-reference the XMLReader, and keeps it for the next (potential) build. Unfortunately, that XMLReader contains references to the ContentHandler it last used (the SAXHandler). The SAXHandler has references to the last Document it handled. In other words, if you re-use the XMLReader, then you also keep a chain of references that link to the Document you last parsed. The Second mechanism ... does a parsed Document refer back to it's SAXBuilder? That is easy to answer, no, it does not. There is no reference from the Document back to the SAXBuilder, and Elements only reference back as far as the parent Document In a more generalized answer, the only issue I can see with having a pool of SAXBuilders is that, if you reuse parsers, you will 'carry' the most recently parsed document from each SAXBuilder until that builder is used again. Again though, I have to ask, is there something you have seen which indicates there may be a back-reference to the SAXBuilder? Rolf If you are *not* reusing the parser then both the parser and the , then SAXBuilder 'remembers' the XMLReader instance On 17/11/2011 10:25 AM, Randall Theobald wrote: > I have a quick question related to pooling SAXBuilders. Can I release the > SAXBuilder back to the pool immediately after the .build method is called? > In other words, there's no tie back to the builder from the resulting > Document or Element objects, right? > > Randall Theobald > > Performance: WebSphere > Business Process > Management& > Connectivity > > IBM Software Group randallt at us.ibm.com > > Austin, TX 512-286-8870 t/l: > 363-8870 > > > > > > > > > > > From: Rolf Lear > To: Michael Kay, > Cc: jdom-interest at jdom.org > Date: 11/11/2011 05:31 AM > Subject: Re: [jdom-interest] JDOM parser reuse memory problem > Sent by: jdom-interest-bounces at jdom.org > > > > On 11/11/2011 3:33 AM, Michael Kay wrote: >> On 10/11/2011 18:51, Rolf Lear wrote: >>> Hi Randall, Michael. >>> >>> It's an interesting observation... and I can see the implications. I >>> would >>> like to take a closer look at at, but that may take a little while. >>> >>> I filed https://github.com/hunterhacker/jdom/issues/52 >>> >>> 'Off the cuff' I can think of one work-around and a few solutions (in >>> addition to what Michael has suggested) >>> >>> 1. immediately after parsing your real document you then parse a >>> dummy/small/inmemory document (even invalid - and catch the exception). >>> 2. Currently when you do-no reuse the parser, it goes back to 'first >>> principals' and queries JAXP, etc. to find a parser instance Instead it >>> could 'cache' the parser 'source' after the first time, and then just >>> create a new instance, instead of doing all the class-based lookups... >> Ouch. Creating a new parser to parse a small document is a cost that >> it's nice to avoid, but it isn't going to kill you. Going through the >> JAXP factory process to get a new ParserFactory is a monstrous cost >> that can dominate all other processing - and reusing the factory costs >> nothing. >> >> Michael Kay >> Saxonica >> > Not sure what you are saying... are you agreeing that the 'ouch' problem > is the one it has at the moment, or the suggestion to skip the JAXB > processing on subsequent non-reuse-parser parses? > > I have not yet had a close look at the problem... the potential option > of not going back to first-principles on subsequent parses may not be > (easily) possible.... Unless Randall can convince me otherwise, I'm > going to finish working on some StAX outputter code I am embroiled in, > and then look at it. > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > > > From jdom at tuis.net Thu Nov 17 08:33:40 2011 From: jdom at tuis.net (Rolf Lear) Date: Thu, 17 Nov 2011 11:33:40 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EC53585.8000103@atos.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> <4EC52FF7.5030509@tuis.net> <4EC53585.8000103@atos.net> Message-ID: <4EC53764.9040607@tuis.net> On 17/11/2011 11:25 AM, BIHANIC Laurent wrote: > Hi > > Le 17/11/11 17:01, Rolf Lear a ?crit : >> In a more generalized answer, the only issue I can see with having a pool of >> SAXBuilders is that, if you reuse parsers, you will 'carry' the most recently >> parsed document from each SAXBuilder until that builder is used again. > It should not: the reference to the (being) built document is held by > SAXHander and the reference to SAXHandler (contentHandler) is explicitely set > to null at the end of the build() method in a finally clause. > > We did this because, in the past, we encountered some JVMs that were reluctant > to garbage-collect the SAXHandler otherwise. > > Regards, > > Laurent > > Hi Laurent. Yeah, the code in the method indicates that (I was not aware it was you who put that in). There is a bug though, and it could explain some confusion. The problem is the the XMLReader has had it's various handlers set to the SAXHandler (Content, DTD, Error, etc.) and when you re-use the XMLReader, you in effect keep a reference to the SAXHandler. Setting the SAXHandler to null at the end of the build does not cause the XMLReader to forget it's handlers. It is an issue that Randall identified a week or so ago. Rolf From jdom at tuis.net Thu Nov 17 09:13:41 2011 From: jdom at tuis.net (Rolf Lear) Date: Thu, 17 Nov 2011 12:13:41 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EC53B80.1030704@atos.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> <4EC52FF7.5030509@tuis.net> <4EC53585.8000103@atos.net> <4EC53764.9040607@tuis.net> <4EC53B80.1030704@atos.net> Message-ID: <4EC540C5.8000003@tuis.net> On 17/11/2011 11:51 AM, BIHANIC Laurent wrote: > Le 17/11/11 17:33, Rolf Lear a ?crit : >> There is a bug though, and it could explain some confusion. >> >> The problem is the the XMLReader has had it's various handlers set to the >> SAXHandler (Content, DTD, Error, etc.) and when you re-use the XMLReader, you >> in effect keep a reference to the SAXHandler. >> >> Setting the SAXHandler to null at the end of the build does not cause the >> XMLReader to forget it's handlers. >> >> It is an issue that Randall identified a week or so ago. > OK. So maybe we should explicitly reset the Document reference in SAXHandler > (by adding a call to a protected method in the finally clause) and remove this > old fix. > > Regards, > > Laurent > > -- > When Randall alerted us to the issue I filed an 'issue' for it. There are a few options that have already been suggested, and it will take some experimentation to figure out which one is the best (or if there are other options). I just have not yet gotten around to it. It needs more time/investigation though to get the right solution. On the other hand, this back-and-forth has inspired me to update the issue with the suggestions-so-far: https://github.com/hunterhacker/jdom/issues/52 Rolf From jdom at tuis.net Thu Nov 17 16:51:46 2011 From: jdom at tuis.net (Rolf) Date: Thu, 17 Nov 2011 19:51:46 -0500 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <4EC4AC32.4090006@tuis.net> References: <4EAE005C.6020608@tuis.net> <716973aed0ca67fc72c533a66ebaf276@tuis.net> <4EC342D2.6070401@tuis.net> <4EC4AC32.4090006@tuis.net> Message-ID: <4EC5AC22.1070008@tuis.net> Hi all. As a further update, I have completed the Java5 compatibility for the code in it's current state. In order to make the process work I had to make some decisions about language levels and compile levels. I discovered that my personal coding style has become fairly Java6 centric especially with respect to @Override annotations. To make the code fully Java5 compatible I would have to make code changes to almost every file. I thus took the decision to maintain Java6 code style which implies the code has to be compiled with Java6 JDK, but with the bytecode-target of Java5. I have thus compiled JDOM2 with JDK6 to a JDK5-level byte-code, and then run that bytecode through the JUnit test harness using Java5, 6, and 7 runtimes. All tests that are expected to pass did (I expect some Jaxen-related tests to fail). Interestingly, the Jaxen tests that fail in Java5 and 6 now pass in Java7 ... ;-) The bottom line is that we can make JDOM2 run in Java5. To make it compile with Java5 will take more work. To re-iterate the changes I have had to make because they indicate the sorts of limitations that we may run in to if we officially support Java5. 1. DescendantIterator uses ArrayDeque - fixed with ListIterator 2. Lots of places use Arrays.copyOf(...) - created a new ArrayCopy utility class 3. XMLConstants.W3C_XML_SCHEMA_NS_URI - does not exist in Java5 - hard-code it (but issue #38 comes to mind) 4. XMLConstants.FEATURE_SECURE_PROCESSING missing - had-coded it. 5. StAX does not support Source-based input in Java5, need to use Streams or Readers only - replaced all Sources with Readers in the test harness. http://java.net/jira/browse/SJSXP-38 6. Java5 has a major bug in org.xml.sax. https://issues.apache.org/bugzilla/show_bug.cgi?id=38316 and https://issues.apache.org/jira/browse/XERCESJ-1261 which basically meant that I had to write my own implementation of Attributes2 to get some tests to run in the SAXHandler. To make the code compile cleanly with Java5 would be a bigger exercise, I think. Now that I have done this work I think I am more comfortable saying we can keep JDOM2 running on Java5 without too much effort, but, I still do not want to say we officially support Java5. I think we can put together a how-to on making it all work. Since I was going through the code versions I thought it would be interesting to run the performance benchmark against the various combinations of code compliance and runtime version. I have put together a web-page for it: http://hunterhacker.github.com/jdom/jdom2/performanceJDK.html It is very interesting for a number of reasons: 1. Java5 is much slower. 2. I realized that I have been running the perf tests using Java7 runtime for a while... because XPath with Java7 is 3 more than twice as fast as Java6... which makes my other performance page a little useless now... 3. There are other minor discrepancies that are interesting none-the-less. In light of the current state of the code and the results I have, I think I am comfortable that we can make a good decision about supported JDKs. Given that I set tomorrow as a decision deadline though, and that the ramifications are not going to be massive from a design perspective, I think I should extend the deadline a little further... perhaps next Friday, the 25th. Rolf On 17/11/2011 1:39 AM, Rolf wrote: > I have been looking in to the implications of supporting Java5. > > Here is a list of changes I have had to make to get the support in: > > 1. DescendantIterator uses ArrayDeque - easy fix. > 2. Lots of places use Arrays.copyOf(...) - created a new ArrayCopy > utility class - an OK fix. > 3. XMLConstants.W3C_XML_SCHEMA_NS_URI - does not exist in Java5 - easy > fix... hard-code it > 4. XMLConstants.FEATURE_SECURE_PROCESSING missing > > > StAX has proven to be the real problem. Specifically, the stand-alone > (pre Java6) StAX library is only specified to have optional support for > the method I use to load up files. It's not a train-smash, there's an > alternative way.... I just have to change all the JUnit tests from > loading from a 'Source' to loading from a FileReader > > > I think, all being said and done, that the code will work in Java5. The > option of supporting Java6 officially, but having good instructions for > making everything work in Java5 is realistic. > > Currently the instructions would be something like: > 1. everything except StAX will work just fine. > ... > > If you want StAX, it comes in two parts, the official API, and an > implementation. > > The API is available in two places, either the official JSR at > http://sjsxp.java.net/#downloads or alternatively the xml-apis.jar which > is part of apache (and is part of the JDOM2 repository) > > The reference implementation of StAX is available from > http://sjsxp.java.net/#downloads as well. You have to download and run a > single .class file SJSXP.class. > > Alternatively, download the woodstox StAX implementation. > > > > > Conclusion, it all seems to be quite reasonable to make Java5 work. I > still am reluctant to make it officially supported. I think though with > some disciplined development the compatibility can be established, and > maintained. > > I don't particularly like having to re-create the java.util.Arrays > functionality, but its not a deal-breaker. > > I think I will commit the code changes though, even if it is just to get > them 'on record' (and clear them out of my development environment so I > can do other things). Committing the code change is not intended to be > an endorsement of Java5 support though! It is a big commit. > > I think that's enough investigation to make a more informed decision > about Java5 support. Currently using Java5 about 250 test cases are > failing, but those I believe will pass again if/when I change the tests > to use a FileReader for StAX (instead of the unsupported Source). > > Rolf > > From jdom at tuis.net Fri Nov 18 10:22:15 2011 From: jdom at tuis.net (Rolf Lear) Date: Fri, 18 Nov 2011 13:22:15 -0500 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <8E2ABA60D9C51C4894395E4D3B2128140335533D@CINMLVEM15.e2k.ad.ge.com> References: <4EAE005C.6020608@tuis.net><716973aed0ca67fc72c533a66ebaf276@tuis.net><4EC342D2.6070401@tuis.net> <4EC4AC32.4090006@tuis.net> <4EC5AC22.1070008@tuis.net> <8E2ABA60D9C51C4894395E4D3B2128140335533D@CINMLVEM15.e2k.ad.ge.com> Message-ID: Thanks Cecil. I'm replying to this not just to say thanks, but because sometimes the mailing list gets stuck, and I've got this message from you but it's been two hours and it has not come through the mailing list system, and not shown up in the markmail archive either. Other mails messages may be stuck too. Perhaps you are not subscribed properly, or perhaps the list is stuck. Jason, CC'd you. Thanks Rolf On Fri, 18 Nov 2011 11:06:31 -0500, "New, Cecil (GE Aviation, US)" wrote: > I know our company is very conservative about changing java versions. > But even we have been using Java 6 for quite some time. Everyone is > more security conscience these days and we are much better at using > supported products. Java 5 end of life was October 2009... just some > data points consider > From jhunter at servlets.com Fri Nov 18 10:31:54 2011 From: jhunter at servlets.com (Jason Hunter) Date: Fri, 18 Nov 2011 10:31:54 -0800 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: References: <4EAE005C.6020608@tuis.net><716973aed0ca67fc72c533a66ebaf276@tuis.net><4EC342D2.6070401@tuis.net> <4EC4AC32.4090006@tuis.net> <4EC5AC22.1070008@tuis.net> <8E2ABA60D9C51C4894395E4D3B2128140335533D@CINMLVEM15.e2k.ad.ge.com> Message-ID: <36917031-D7B7-480F-9324-FA80B444FD6B@servlets.com> Rolf, your mail made it to the list: http://markmail.org/message/n73dclwnimsgkqqn Cecil, your mail didn't because you're subscribed as @ae.ge.com and posted as @ge.com. I force-subscribed @ge.com and marked it not to get mail, so now you can post as either. For people with multiple email addresses, the trick is to subscribe as all of them so the anti-spam filters let you through no matter how you send, but mark only one as actually receiving mail. There's a NOMAIL flag you can set. -jh- On Nov 18, 2011, at 10:22 AM, Rolf Lear wrote: > > Thanks Cecil. > > I'm replying to this not just to say thanks, but because sometimes the > mailing list gets stuck, and I've got this message from you but it's been > two hours and it has not come through the mailing list system, and not > shown up in the markmail archive either. Other mails messages may be stuck > too. > > Perhaps you are not subscribed properly, or perhaps the list is stuck. > > Jason, CC'd you. > > Thanks > > Rolf > > On Fri, 18 Nov 2011 11:06:31 -0500, "New, Cecil (GE Aviation, US)" > wrote: >> I know our company is very conservative about changing java versions. >> But even we have been using Java 6 for quite some time. Everyone is >> more security conscience these days and we are much better at using >> supported products. Java 5 end of life was October 2009... just some >> data points consider >> From jdom at tuis.net Fri Nov 18 16:32:25 2011 From: jdom at tuis.net (Rolf) Date: Fri, 18 Nov 2011 19:32:25 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EC540C5.8000003@tuis.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> <4EC52FF7.5030509@tuis.net> <4EC53585.8000103@atos.net> <4EC53764.9040607@tuis.net> <4EC53B80.1030704@atos.net> <4EC540C5.8000003@tuis.net> Message-ID: <4EC6F919.1010207@tuis.net> I have updated the issue with some performanc numbers for some different conditions. Have a look at: https://github.com/hunterhacker/jdom/issues/52 It seems to indicate that fixing the 'back to raw JAXP for each loop' will only save a little time, but parser reuse saves a lot. Need to implement both options, I think, implement SAXFactory caching as well as better memory management on Parser reuse. Out of interest, I thought the default setting for parser reuse was 'false', but it is true. XMLReaders will be reused unless you explicitly setReuseParser(false); This in turn means that my comments about 'normal' process should be reversed, the normal case for this bug condition is that we keep a reference from the SAXBuilder to the Document for as long as the SAXBuilder is active, and not used to rebuild another document. Thanks Rolf From jdom at tuis.net Sat Nov 19 20:12:49 2011 From: jdom at tuis.net (Rolf) Date: Sat, 19 Nov 2011 23:12:49 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EC6F919.1010207@tuis.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> <4EC52FF7.5030509@tuis.net> <4EC53585.8000103@atos.net> <4EC53764.9040607@tuis.net> <4EC53B80.1030704@atos.net> <4EC540C5.8000003@tuis.net> <4EC6F919.1010207@tuis.net> Message-ID: <4EC87E41.5080509@tuis.net> Hi all. I am looking to run some ideas past the group. I see a number of problems with the SAXBuilder as it currently is. It is somewhat hard to describe them all, but, the bottom line is that I think the API should be changed for it in a smallish way that will affect people who use a custom SAXHandler, or those who hard-code a SAXParser Driver classname in the SAXBuilder constructor. I believe the vast majority of people use the default constructor, and do not subclass the SAXHandler so this change will affect only a small subset of JDOM users. So, here are the problems I see, in addition to the bug related to long-living memory references. Problem 1: SAXParser creation JDOM uses 3 mechanisms to create a SAX parser: 1. if the user specifies a specific SAX 'Driver' classname 2. else falls-back to JAXP 3. else falls back to a 'default' SAX Driver (xerces) I believe that the 'default' fall-back should be removed because if JAXP fails there's nothing. At minimum, JAXP will find the parser embedded in the Java runtime, and the 'default' fallback will never happen. Put another way, if JAXP fails, there is no reason to expect that the 'default' "org.apache.xerces.parsers.SAXParser" will work (because if you have org.apache.xerces.parsers.SAXParser then you also have a working JAXP parser....) I also believe the user-specified 'driver' mechanism should be replaced with a straight XMLReaderFactory instance. This makes the JDOM user responsible for creating the factory. It also adds the ability for the user to have just a single Factory instance and not have JDOM creating a new instance each time a new SAXBuilder is created. This will give the user the opportunity to improve performance that JDOM cannot do. XMLReaderFactory is part of SAX2.0 and has been in Java since at least Java 1.4. It is the 'correct' way to get an XMLReader instance. Also, new JDOM users will not be confused by this string value, wheras XMLReaderFactory is a real, standard, and well documented entity. Further, there should be no fallback mechanism: if the user manually provides a XMLReaderFactory and it fails then it should all fail. If the user uses JAXP (the default), and JAXP fails then we fail. In the Java5+ world JDOM should not need to be 'molly-coddling' the JAXP process. Also, we should not be useing such outdated mechanisms as direct SAX driver classes. This change would 'neaten' up the API for creating SAXBuilders: 1. you either use the 'normal' JAXP process, or... 2. you use the standard non-JAXP mechanism XMLReaderFactory Problem 2: Parser reuse. XMLReader reuse is much more efficient than creating a new parser for each JDOM build. There have been a few attempts to improve the parser reuse in JDOM, but it could be taken even further by only re-configuring the XMLReader when the SAXBuilder configuration changes. In a typical use where the configuration is unchanged between consecutive JDOM builds then there does not need to be any reconfiguration at all. Problem 3: The long-linked memory The fix for this is probably going to need a 'reset' method on the SAXHandler that de-references the Document that was last parsed. This in turn will require an API change on SAXHandler. Problem 4: SAXHandler sub-classing SAXHandler subclassing allows for custom event handling, but, in order to use a custom SAXHandler you also have to subclass SAXBuilder and override the createContentHandler() method. This is a cumbersome (and not well documented) mechanism. What with these (at least) 4 issues with SAXBuilder it makes sense to change the API slightly to accomodate the 'new' way of doing things. This will impact the way that subclassing is done, and will impact those who use a non-JAXP SAX parser. If these changes (or others like them) need to happen (and I think they do), then it makes sense to do it right, and comprehensively. I am going to play with the code a little to get an idea of what can be done, but I am looking for any ideas, suggestions, criticisms. I have already made some changes affecting the JDOM2 API but I think this could be one of those changes that makes a real difference (for the better). Rolf On 18/11/2011 7:32 PM, Rolf wrote: > I have updated the issue with some performanc numbers for some different > conditions. > > Have a look at: https://github.com/hunterhacker/jdom/issues/52 > > It seems to indicate that fixing the 'back to raw JAXP for each loop' > will only save a little time, but parser reuse saves a lot. > > Need to implement both options, I think, implement SAXFactory caching as > well as better memory management on Parser reuse. > > Out of interest, I thought the default setting for parser reuse was > 'false', but it is true. XMLReaders will be reused unless you explicitly > setReuseParser(false); > > This in turn means that my comments about 'normal' process should be > reversed, the normal case for this bug condition is that we keep a > reference from the SAXBuilder to the Document for as long as the > SAXBuilder is active, and not used to rebuild another document. > > Thanks > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Tue Nov 22 20:35:23 2011 From: jdom at tuis.net (Rolf) Date: Tue, 22 Nov 2011 23:35:23 -0500 Subject: [jdom-interest] JDOM parser reuse memory problem In-Reply-To: <4EC87E41.5080509@tuis.net> References: <4EBC109A.9020908@saxonica.com> <4EBCDDDA.7080208@saxonica.com> <4EBD0551.3020205@tuis.net> <4EC52FF7.5030509@tuis.net> <4EC53585.8000103@atos.net> <4EC53764.9040607@tuis.net> <4EC53B80.1030704@atos.net> <4EC540C5.8000003@tuis.net> <4EC6F919.1010207@tuis.net> <4EC87E41.5080509@tuis.net> Message-ID: <4ECC780B.4080501@tuis.net> Hi again everyone. I have been playing with SAXBuilder, trying to find a way to put it together in such a way that it is still 'SAXBuilder' but improves parser reuse, and still enables customization. I think I have come up with a solution that is backward compatible for 'everyday' use (but compatibility is broken for people who have sub-classed either SAXHandler or SAXBuilder - and some methods have been deprecated on SAXBuilder and others have been renamed with the old names deprecated too). The performance results are more effective than I expected. See: https://github.com/hunterhacker/jdom/issues/52#issuecomment-2844750 where you can see that the code now re-uses the parser and the whole process completes in a quarter of the previous time. The changes are hard to describe in one place, but I have put together some documentation here: http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/input/sax/package-summary.html#package_description Finally, I have updated the performance page too: http://hunterhacker.github.com/jdom/jdom2/performance.html You can see that the SAX builder has now re-taken the lead from the StAX prcess. Thanks Rolf On Sat, 19 Nov 2011 23:12:49 -0500, Rolf wrote: > Hi all. > > I am looking to run some ideas past the group. I see a number of > problems with the SAXBuilder as it currently is. It is somewhat hard to > describe them all, but, the bottom line is that I think the API should > be changed for it in a smallish way that will affect people who use a > custom SAXHandler, or those who hard-code a SAXParser Driver classname > in the SAXBuilder constructor. I believe the vast majority of people use > the default constructor, and do not subclass the SAXHandler so this > change will affect only a small subset of JDOM users. > > So, here are the problems I see, in addition to the bug related to > long-living memory references. > > > Problem 1: SAXParser creation > > JDOM uses 3 mechanisms to create a SAX parser: > 1. if the user specifies a specific SAX 'Driver' classname > 2. else falls-back to JAXP > 3. else falls back to a 'default' SAX Driver (xerces) > > I believe that the 'default' fall-back should be removed because if JAXP > fails there's nothing. At minimum, JAXP will find the parser embedded in > the Java runtime, and the 'default' fallback will never happen. Put > another way, if JAXP fails, there is no reason to expect > that the 'default' "org.apache.xerces.parsers.SAXParser" will work > (because if you have org.apache.xerces.parsers.SAXParser then you also > have a working JAXP parser....) > > I also believe the user-specified 'driver' mechanism should be replaced > with a straight XMLReaderFactory instance. This makes the JDOM user > responsible for creating the factory. It also adds the ability for the > user to have just a single Factory instance and not have JDOM creating a > new instance each time a new SAXBuilder is created. This will give the > user the opportunity to improve performance that JDOM cannot do. > XMLReaderFactory is part of SAX2.0 and has been in Java since at least > Java 1.4. It is the 'correct' way to get an XMLReader instance. Also, > new JDOM users will not be confused by this string value, wheras > XMLReaderFactory is a real, standard, and well documented entity. > > Further, there should be no fallback mechanism: if the user manually > provides a XMLReaderFactory and it fails then it should all fail. If the > user uses JAXP (the default), and JAXP fails then we fail. In the Java5+ > world JDOM should not need to be 'molly-coddling' the JAXP process. > Also, we should not be useing such outdated mechanisms as direct SAX > driver classes. > > This change would 'neaten' up the API for creating SAXBuilders: > 1. you either use the 'normal' JAXP process, or... > 2. you use the standard non-JAXP mechanism XMLReaderFactory > > > Problem 2: Parser reuse. > > XMLReader reuse is much more efficient than creating a new parser for > each JDOM build. There have been a few attempts to improve the parser > reuse in JDOM, but it could be taken even further by only re-configuring > the XMLReader when the SAXBuilder configuration changes. In a typical > use where the configuration is unchanged between consecutive JDOM builds > then there does not need to be any reconfiguration at all. > > > Problem 3: The long-linked memory > > The fix for this is probably going to need a 'reset' method on the > SAXHandler that de-references the Document that was last parsed. This in > turn will require an API change on SAXHandler. > > Problem 4: SAXHandler sub-classing > > SAXHandler subclassing allows for custom event handling, but, in order > to use a custom SAXHandler you also have to subclass SAXBuilder and > override the createContentHandler() method. This is a cumbersome (and > not well documented) mechanism. > > > > What with these (at least) 4 issues with SAXBuilder it makes sense to > change the API slightly to accomodate the 'new' way of doing things. > This will impact the way that subclassing is done, and will impact those > who use a non-JAXP SAX parser. > > If these changes (or others like them) need to happen (and I think they > do), then it makes sense to do it right, and comprehensively. > > I am going to play with the code a little to get an idea of what can be > done, but I am looking for any ideas, suggestions, criticisms. > > I have already made some changes affecting the JDOM2 API but I think > this could be one of those changes that makes a real difference (for the > better). > > Rolf > > > On 18/11/2011 7:32 PM, Rolf wrote: >> I have updated the issue with some performanc numbers for some different >> conditions. >> >> Have a look at: https://github.com/hunterhacker/jdom/issues/52 >> >> It seems to indicate that fixing the 'back to raw JAXP for each loop' >> will only save a little time, but parser reuse saves a lot. >> >> Need to implement both options, I think, implement SAXFactory caching as >> well as better memory management on Parser reuse. >> >> Out of interest, I thought the default setting for parser reuse was >> 'false', but it is true. XMLReaders will be reused unless you explicitly >> setReuseParser(false); >> >> This in turn means that my comments about 'normal' process should be >> reversed, the normal case for this bug condition is that we keep a >> reference from the SAXBuilder to the Document for as long as the >> SAXBuilder is active, and not used to rebuild another document. >> >> Thanks >> >> Rolf >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Tue Nov 22 21:03:58 2011 From: jdom at tuis.net (Rolf) Date: Wed, 23 Nov 2011 00:03:58 -0500 Subject: [jdom-interest] Snapshot Release: jdom-2.x-2011.11.22.22.58.zip Message-ID: <4ECC7EBE.5050409@tuis.net> Please take the new SAX Parser for a spin. See the package API documentation here: http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/input/sax/package-summary.html See the new features in JDOM2 here: https://github.com/hunterhacker/jdom/wiki/JDOM2-Features and most importantly, download the new snapshot here: https://github.com/hunterhacker/jdom/downloads Thanks Rolf