[jdom-interest] Simple xhtml/entity resolver?

Olivier Jaquemet olivier.jaquemet at jalios.com
Thu Mar 29 08:47:33 PDT 2012


Hi Oliver,

JDom is a great tool for parsing XML...

... but for XHTML fragment (which may not be completely XHTML compliant 
... ?)
and specially for text extraction, I would strongly suggest JSoup 
http://jsoup.org/

   String text = org.jsoup.Jsoup.parse(html).text();

Whatever is your html it will work like a charm (even it is an ugly copy 
paste wysiwyg from word or any ugly html export from whatever website)

Olivier

On 29/03/2012 15:23, Oliver Ruebenacker wrote:
>       Hello,
>
>    I need a simple way to convert some XHTML fragments, provided as a
> JDOM Element, into plain text. I am willing to ignore most HTML tags
> and consider only the most commonly used predefined entities.
>
>    In JDOM, an entity reference has a name, a public id and a system
> id. I think I know what the named means, for named entities. But what
> about numeric entities, how do I get the code point? And what are
> public id and system id?
>
>    Thanks!
>
>       Take care
>       Oliver
>

-- 
Olivier Jaquemet<olivier.jaquemet at jalios.com>
Ingénieur R&D Jalios S.A. - http://www.jalios.com/
@OlivierJaquemet +33970461480




More information about the jdom-interest mailing list