<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Using XPath on real-world HTML documents</title>
	<atom:link href="http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/feed/" rel="self" type="application/rss+xml" />
	<link>http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/</link>
	<description>My Blog, Take 4</description>
	<lastBuildDate>Wed, 18 Jan 2012 14:08:16 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: f1sh</title>
		<link>http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/comment-page-1/#comment-755</link>
		<dc:creator>f1sh</dc:creator>
		<pubDate>Wed, 26 May 2010 10:18:47 +0000</pubDate>
		<guid isPermaLink="false">http://nicklothian.com/blog/?p=12#comment-755</guid>
		<description>This is great stuff.
I had the problem of unquoted html attributes (such as &#039;href=http://google.com&#039;) which screwed up my initial approaches using SAX/JDOM/....
I had to figure out the imports (maven dependencies) which were &#039;tagsoup&#039; and &#039;xom&#039;.

Note:
The line
XPathContext context = new XPathContext(&quot;html&quot;, &quot;http://www.w3.org/1999/xhtml&quot;)
causes all html tags in your document to only be xpath-addressable using the prefix html.
Meaning: In order to get all html links with images inside them, you have to use
doc.query(&quot;//html:a[//html:img]&quot;, context); //note the prefix html

best regards</description>
		<content:encoded><![CDATA[<p>This is great stuff.<br />
I had the problem of unquoted html attributes (such as &#8216;href=http://google.com&#8217;) which screwed up my initial approaches using SAX/JDOM/&#8230;.<br />
I had to figure out the imports (maven dependencies) which were &#8216;tagsoup&#8217; and &#8216;xom&#8217;.</p>
<p>Note:<br />
The line<br />
XPathContext context = new XPathContext(&#8220;html&#8221;, &#8220;http://www.w3.org/1999/xhtml&#8221;)<br />
causes all html tags in your document to only be xpath-addressable using the prefix html.<br />
Meaning: In order to get all html links with images inside them, you have to use<br />
doc.query(&#8220;//html:a[//html:img]&#8220;, context); //note the prefix html</p>
<p>best regards</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: priit</title>
		<link>http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/comment-page-1/#comment-660</link>
		<dc:creator>priit</dc:creator>
		<pubDate>Tue, 04 Aug 2009 09:28:09 +0000</pubDate>
		<guid isPermaLink="false">http://nicklothian.com/blog/?p=12#comment-660</guid>
		<description>That&#039;s it, so far I&#039;ve seen no-one and my attempts have failed :)</description>
		<content:encoded><![CDATA[<p>That&#8217;s it, so far I&#8217;ve seen no-one and my attempts have failed :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Lothian</title>
		<link>http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/comment-page-1/#comment-659</link>
		<dc:creator>Nick Lothian</dc:creator>
		<pubDate>Mon, 03 Aug 2009 23:20:35 +0000</pubDate>
		<guid isPermaLink="false">http://nicklothian.com/blog/?p=12#comment-659</guid>
		<description>Yes, for sure. You could use TagSoup to cleanse HTML data in RSS/Atom. I don&#039;t know of anyone has done it, though</description>
		<content:encoded><![CDATA[<p>Yes, for sure. You could use TagSoup to cleanse HTML data in RSS/Atom. I don&#8217;t know of anyone has done it, though</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: priit</title>
		<link>http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/comment-page-1/#comment-658</link>
		<dc:creator>priit</dc:creator>
		<pubDate>Mon, 03 Aug 2009 21:33:37 +0000</pubDate>
		<guid isPermaLink="false">http://nicklothian.com/blog/?p=12#comment-658</guid>
		<description>Hei,

Can you tell, if there&#039;s a possibility to pair up Tagsoup and Rome?</description>
		<content:encoded><![CDATA[<p>Hei,</p>
<p>Can you tell, if there&#8217;s a possibility to pair up Tagsoup and Rome?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 1328890692.213 seconds -->

