[jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element

Rolf Lear jdom at tuis.net
Thu Nov 8 03:35:45 PST 2012


Hi Lars.

Indeed, file a bug against JTidy.

Here's the offending lines of code in the DOMNodeImpl class from JTidy:

     /**
      * @todo DOM level 3 getTextContent() Not implemented. Returns null.
      * @see org.w3c.dom.Node#getTextContent()
      */
     public String getTextContent() throws DOMException
     {
         return null;
     }


THat's from line 523 of: 
http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/DOMNodeImpl.java?revision=1132&view=markup


Rolf


On 08/11/2012 4:20 AM, Larsen wrote:
> Hi Rolf,
>
> first of all, thanks for your extensive help!
>
>
>> The Java API documentation is a mess in this area.... JDK 1.5 package
>> information indicates that the org.w3c.dom API supports DOM Level 2:
>> http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/package-summary.html
>>
>
> That´s nice to hear. I was already wondering wether my English is too
> bad or if the javadoc is so crudely written that I can´t understand it.
>
>
>> What would be useful is if you could determine the library that you
>> are using. Since you have already 'hacked' the code, why don't you
>> temporarily add the line: System.out.println(text.getClass()); to the
>> method. This will tell you the concrete implementation of DOM that's
>> broken.
>
> It´s "org.w3c.tidy.DOMTextImpl". I use JTidy to bring HTML code I obtain
> from a customer´s database into Java objects.
> So, should I file a bug against JTidy?
>
>
> My code in that area in case it helps:
>
>      private org.w3c.dom.Document getDocFromTidy(String html) {
>
>          Tidy tidy = new Tidy();
>          tidy.setShowWarnings(false);
>          tidy.setQuiet(true);
>          tidy.setXHTML(true);
>          tidy.setDocType("omit");
>
>          // convert text representation to Document
>          InputStream bais = new ByteArrayInputStream(html.getBytes());
>
>          try {
>              bais.close();
>          } catch (IOException e) {
>              log.error("Exception on closing the InputStream", e);
>          }
>
>          return tidy.parseDOM(bais, null);
>      }
>
>
>
> Lars
>



More information about the jdom-interest mailing list