PDA

View Full Version : parsing HTML tags using XPath in Java


Virgo
01-15-2005, 06:50 PM
Hi,
I am using XPath to parse a XML document like this:
<StoryText>
<p>
Somebody said:<q>Quoted text</q>
</p>
<p>
Find more relevant text at: <a href="#"></a>
</p>
</StoryText>

The code to parse this is as follows:
NodeList storyParaList = XPathAPI.selectNodeList(storyNode, "StoryText/*");
for (int i = 0; i < storyParaList.getLength(); i++) {
Node storyTextChildNode = storyParaList.item(i);
if (storyTextChildNode.getNodeType() == Node.ELEMENT_NODE) {
storyStringBuffer.append("<" + ((Element) storyTextChildNode).getTagName() + ">");
if ((XPathAPI.selectSingleNode(storyTextChildNode, "text()")) != null) {
storyStringBuffer.append(XPathAPI.selectSingleNode(storyTextChildNode, "text()").getNodeValue());
}
storyStringBuffer.append("</" + ((Element) storyTextChildNode).getTagName() + ">");
}
}

Now the problem is that the text in <p> tags is picked up till some other tag like <a> or <q> doesnt come in it. The moment parser encounters tags like <a> or <q> or <b>, it skips the text from there and goes to the next <p> node. I want all these tags to show up in the text.

Can somebody help me with this please?