[jdom-interest] JDOM and text outside tags

Stein Erik Berget seberget at escenic.com
Wed Oct 22 23:30:37 PDT 2003


On Wed, 22 Oct 2003 18:06:09 +0900, Jacques-Albert De Blasio 
<jacquesalbert.deblasio at toshiba.co.jp> wrote:

> Hi all,
>
> I have a problem with JDOM and I am sure that one of you JDOM guru could 
> help me out :)
>
> In a program I'm writing, I first fetch HTML pages on the web, tidy them 
> with NekoHTML (JTidy was not sufficient as it could not parse japanese 
> html pages) and then transform the DOM outputed by NekoHTML into JDOM 
> documents.
>
> My problem is the following: in a given page, I have tags such as
>
> <TD>
> <SMALL>
> <IMG src = "..." /> some_text <BR />
> <IMG src =" ..." /> some_other_text <BR />
> </SMALL>
> </TD>
>
> How can I fetch the "some_text" and "some_other_text" ?

You get the text from the <SMALL> tag, using code looking something like 
this:
smallElement.getText();
smallElement.getTextNormalize();

If you have the <SMALL> tag as an element as you see this will be quite 
easy to accomplish.

Good luck, and have a nice day!


-- 
Stein Erik Berget
Research & Development
Escenic AS                              + 47 23 27 34 40 (switchboard)
Sommerogt 13-15                         + 47 23 27 34 01 (fax)
Box 2393 Solli                          http://www.escenic.com/
N-0201 OSLO



More information about the jdom-interest mailing list