<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: xpath</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/xpath.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2008-02-11T05:31:55+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting John Resig</title><link href="https://simonwillison.net/2008/Feb/11/john/#atom-tag" rel="alternate"/><published>2008-02-11T05:31:55+00:00</published><updated>2008-02-11T05:31:55+00:00</updated><id>https://simonwillison.net/2008/Feb/11/john/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://ejohn.org/blog/xpath-overnight/"&gt;&lt;p&gt;"Why doesn't jQuery have an XPath CSS Selector implementation?" For now, my answer is: I don't want two selector implementations - it makes the code base significantly harder to maintain, increases the number of possible cross-browser bugs, and drastically increases the filesize of the resulting download.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://ejohn.org/blog/xpath-overnight/"&gt;John Resig&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/css"&gt;css&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/john-resig"&gt;john-resig&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jquery"&gt;jquery&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xpath"&gt;xpath&lt;/a&gt;&lt;/p&gt;



</summary><category term="css"/><category term="john-resig"/><category term="jquery"/><category term="performance"/><category term="xpath"/></entry><entry><title>lxml.cssselect</title><link href="https://simonwillison.net/2007/Sep/24/lxmlcssselect/#atom-tag" rel="alternate"/><published>2007-09-24T23:57:17+00:00</published><updated>2007-09-24T23:57:17+00:00</updated><id>https://simonwillison.net/2007/Sep/24/lxmlcssselect/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://codespeak.net/lxml/dev/cssselect.html"&gt;lxml.cssselect&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
lxml includes an implementation of CSS 3 selectors, which compiles them to XPath expressions. Should be a useful tool for parsing Microformats from Python.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://blog.ianbicking.org/2007/09/24/lxmlhtml/"&gt;Ian Bicking&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/css"&gt;css&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/css3"&gt;css3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lxml"&gt;lxml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microformats"&gt;microformats&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/selectors"&gt;selectors&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xpath"&gt;xpath&lt;/a&gt;&lt;/p&gt;



</summary><category term="css"/><category term="css3"/><category term="libxml2"/><category term="lxml"/><category term="microformats"/><category term="python"/><category term="selectors"/><category term="xpath"/></entry><entry><title>Mozilla XPath Documentation</title><link href="https://simonwillison.net/2005/Apr/14/mozilla/#atom-tag" rel="alternate"/><published>2005-04-14T12:57:06+00:00</published><updated>2005-04-14T12:57:06+00:00</updated><id>https://simonwillison.net/2005/Apr/14/mozilla/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www-jcsu.jesus.cam.ac.uk/~jg307/mozilla/xpath-tutorial.html"&gt;Mozilla XPath Documentation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is extremely useful for writing Greasemonkey user scripts.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/greasemonkey"&gt;greasemonkey&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xpath"&gt;xpath&lt;/a&gt;&lt;/p&gt;



</summary><category term="greasemonkey"/><category term="mozilla"/><category term="xpath"/></entry><entry><title>Using XPath to mine XHTML</title><link href="https://simonwillison.net/2003/Oct/21/xpathRocks/#atom-tag" rel="alternate"/><published>2003-10-21T05:31:23+00:00</published><updated>2003-10-21T05:31:23+00:00</updated><id>https://simonwillison.net/2003/Oct/21/xpathRocks/#atom-tag</id><summary type="html">
    &lt;p&gt;This morning, I finally decided to &lt;a href="http://users.skynet.be/sbi/libxml-python/" title="Libxml and Libxslt Python Bindings for Windows"&gt;install libxml2&lt;/a&gt; and see what &lt;a href="http://www.xmldatabases.org/WK/blog/607?t=item" title="Givin libxml2 some love"&gt;all the fuss&lt;/a&gt; was about, in particular with respect to XPath. What followed is best described as an enlightening experience.&lt;/p&gt;

&lt;p&gt;XPath is a beautifully elegant way of adressing "nodes" within an &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; document. XPath expressions look a little like file paths, for example:&lt;/p&gt;

&lt;dl&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;/first/second&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match any &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements that occur inside a &lt;code class="xml"&gt;&amp;lt;first&amp;gt;&lt;/code&gt; element that is the root element of the document&lt;/dd&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;//second&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match all &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements irrespective of their place in the document&lt;/dd&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;//second[@hi]&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match all &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements with a 'hi' attribute&lt;/dd&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;//second[@hi="there"]&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match all &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements with a 'hi' attribute that equals "there"&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;A full &lt;a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html"&gt;XPath tutorial&lt;/a&gt; is available.&lt;/p&gt;

&lt;p&gt;The Python libxml2 bindings make running XPath expressions incredibly simple. Here's some code that extracts the titles of all of the entries on my Kansas blog from the site's &lt;acronym title="Really Simply Syndication"&gt;RSS&lt;/acronym&gt; feed:&lt;/p&gt;

&lt;pre&gt;&lt;code class="python"&gt;&amp;gt;&amp;gt;&amp;gt; import libxml2
&amp;gt;&amp;gt;&amp;gt; import urllib
&amp;gt;&amp;gt;&amp;gt; rss = libxml2.parseDoc(
      urllib.urlopen('http://www.a-year-in-kansas.com/syndicate/').read())
&amp;gt;&amp;gt;&amp;gt; rss.xpathEval('//item/title')
[&amp;lt;xmlNode (title) object at 0xb4b260&amp;gt;, &amp;lt;xmlNode (title) object at 0xa99968&amp;gt;, 
&amp;lt;xmlNode (title) object at 0x10dce68&amp;gt;]
&amp;gt;&amp;gt;&amp;gt; [node.content for node in rss.xpathEval('//item/title')]
['Music and Brunch', 'House hunting', 'Arrival']
&amp;gt;&amp;gt;&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Why is this so exciting? I've been &lt;a href="/2002/Jun/16/myFirstXhtmlMindBomb/" title="My first XHTML mind bomb"&gt;saying&lt;/a&gt; &lt;a href="/2002/Aug/11/benefitsOfXhtml/" title="Benefits of XHTML"&gt;for&lt;/a&gt; &lt;a href="/2003/Jan/06/xhtmlIsJustFine/" title="XHTML is just fine"&gt;over&lt;/a&gt; &lt;a href="/2003/Jan/08/xhtmlIsStillGreatForContent/" title="XHTML is still great for content"&gt;a&lt;/a&gt; &lt;a href="/2003/Aug/03/futureProotContent/" title="XHTML for future-proof content"&gt;year&lt;/a&gt; that &lt;acronym title="eXtensible HyperText Markup Language"&gt;XHTML&lt;/acronym&gt; is an ideal format for storing pieces of content in a database or content management system. Serving content to browsers as &lt;acronym title="HyperText Markup Language"&gt;HTML&lt;/acronym&gt; 4 makes perfect sense, but storing your actual content as &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; gives you the ability to process that content in the future using &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; tools.&lt;/p&gt;

&lt;p&gt;So far, the best example of a powerful tool for manipulating this stored &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; has been &lt;acronym title="eXtensible Stylesheet Language Transformations"&gt;XSLT&lt;/acronym&gt;. &lt;acronym title="eXtensible Stylesheet Language Transformations"&gt;XSLT&lt;/acronym&gt; has its fans, but is also often criticised as being unintuitive and having a steep learning curve. XPath is a far better example of a powerful, easy to use tool that can be brought to bare on &lt;acronym title="eXtensible HyperText Markup Language"&gt;XHTML&lt;/acronym&gt; content.&lt;/p&gt;

&lt;p&gt;Enough talk, here's an example of what I mean. The following code snippet creates a Python dictionary of all of the acronyms currently visible on the front page of my blog, mapping their shortened version to the expanded text (extracted from the title attribute):&lt;/p&gt;

&lt;pre&gt;&lt;code class="python"&gt;
&amp;gt;&amp;gt;&amp;gt; blog = libxml2.parseDoc(
    urllib.urlopen('http://simon.incutio.com/').read())
&amp;gt;&amp;gt;&amp;gt; ctxt = blog.xpathNewContext()
&amp;gt;&amp;gt;&amp;gt; ctxt.xpathRegisterNs('xhtml', 'http://www.w3.org/1999/xhtml')
0
&amp;gt;&amp;gt;&amp;gt; acronyms = dict([(a.content, a.prop('title')) 
    for a in ctxt.xpathEval('//xhtml:acronym')])
&amp;gt;&amp;gt;&amp;gt; for acronym, fulltext in acronyms.items():
	print acronym, ':', fulltext


DHTML : Dynamic HyperText Markup Language
URL : Universal Republic of Love
HTML : HyperText Markup Language
SIG : Special Interest Group
PHP : PHP: Hypertext Preprocessor
CSS : Cascading Style Sheets
&amp;gt;&amp;gt;&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above code is slightly more complicated than the first example, as using XPath with a document that uses &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; namespaces requires some extra work to register the namespace with the XPath parser. Still, it's a pretty short piece of code considering what it does.&lt;/p&gt;

&lt;p&gt;For an example of how powerful XPath can be on a much larger scale, take a look at Sam Ruby's &lt;a href="http://www.intertwingly.net/blog/1601.html"&gt;XPath enabled blog search feature&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xhtml"&gt;xhtml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xpath"&gt;xpath&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xslt"&gt;xslt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="libxml2"/><category term="python"/><category term="xhtml"/><category term="xml"/><category term="xpath"/><category term="xslt"/></entry></feed>