[jdom-interest] Parsing HTML elements

Rolf Lear jdom at tuis.net
Tue Nov 20 09:14:02 PST 2012

Hmmm not using the default API.

JDOM expects the getURI() method to have a value if there is a prefix for the attribute. This is reasonable... ;)

This indicates the sax stream is broken. JDOM should be throwing "Namespace URIs must be non-null and non-empty Strings".

If you cannot fic the SAX stream code, you can maybe write a proxy class that fixes the URIs as the events pass through.


Paul Libbrecht <paul at hoplahup.net> wrote:
Hello JDOm experts,

I'm hitting a wall here and I am not sure who is responsible.
Just like the previous series of post, I am trying to parse an HTML document.
In this case I use the CyberNeko HTML parser http://nekohtml.sourceforge.net/ which creates a SAX stream hence is easily convertible to a JDOM document.

Now, my big issue is that the document I have (which I cannot easily change right now) contains undeclared namespace-prefixed attribute-names!

Do I have a way to predefine the namespace somewhere?

thanks in advance

To control your jdom-interest membership:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20121120/8c6376f0/attachment.html>

More information about the jdom-interest mailing list