DOM Parsing With XPath and JavaScript | Wed 20 Apr 2005 | Blog

DOM Parsing With XPath and JavaScriptWed 20 Apr 2005

The fact that GreaseMonkey can make the web behave the way we want it to is amazing in itself. The added bonus is that people are making public lots of funky JavaScript for our dissection. If you don't intend using any of the scripts it might still be worth your while seeing what you can learn from them.

Take my LDD Super Search script for example. Not wanting to blow my own trumpet but it contains some nifty JavaScript that's worth knowing about.

In order to operate on all links in the search results page we need to get a handle on them. To do this I used XPath. Here's the code:

var findPattern = "//table[2]//table[2]//table[2]//tr/td[4]/font/a";
var resultLinks = document.evaluate( findPattern, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null );
var i=0; while ( (res = resultLinks.snapshotItem(i) ) !=null ){ //do something to the linki++ }

The important part to this is the XPath pattern I used. Using the DOM Inspector it was "easy" to see that each document's link lived inside an A tag, which lived inside a FONT tag, which was in the fourth cell (/TD[4]) of all the rows (//TR) of the second nested TABLE in the second nested TABLE of the second nested top-level TABLE of the document. Simple. This translates to:

//table[2]//table[2]//table[2]//tr/td[4]/font/a

Passing this pattern to the document.evaluate() method supplies us with the relevant set of nodes from the DOM, which we can then cycle through and do what the hell we like to.

Using XPath you can pretty much find any part of the page we like. Here's some examples of what you can do.

XPath can make your code simpler. Take Julian's script from the other day. To find every image on the page that's a certain file it uses this code:

var images = document.getElementsByTagName("img"); for (i = 0; i < images.length; i++) { var n = imagesi; var src = n.getAttribute("src"); if ((src) && (src.indexOf("threadmap_inactive.gif") > 0)) { //do stuff here } }

I don't want to teach my grandmother how to suck eggs but it's worth noting that it's much easier when using XPath. We just need the following pattern:

//img[contains(@src, 'threadmap_inactive.gif')]

Until now I've always thought of XPath (as well as XML, XSLT etc to a degree) as being nice to know about but practically useless in everyday life. When was the last time you actually used XSL in your role as a web developer!? Well, now you can. This is code that you can actually use in your everyday application! The same principles (and code, in theory) can be used in IE. Here's one method, which requires a sizeable JS file.

I keep finding myself looking forward to the day I get to work on a Firefox-only application...

Comments

- Fredrik
- Wed 20 Apr 2005 04:44
Very Nice! I tend to use XML (et al) more often in my everyday development (but not with javascript).
It's a really nice way to integrate and process information between different systems.
In most cases you don't need spend big $ on a big "bloated" middleware product to share information between domino and other systems/technologies (unless the integration between the systems plays a major part and contains a complex set of rules and involves many systems in different flavours)
For 1 to 1 integration/communication, XML over HTTP or similar often solves the problem.
- Jerry Carter
- Wed 20 Apr 2005 10:32
Thanks Jake, That's a handy little snippet.
Unfortunately, the only useful ideas immediately coming to mind all revolve around variations of 'Mad-Libs' where I might use a regexp to look for certain words within p nodes and replace them with certain other words... actually, not useful, but amusing.
Come to think of it, a while back, Julian was workign on a text highlighting function that was intended to enhance domino search results pages to be more googlish (ala cache view). I played with that quite a bit on my own - it would make a great GreaseMonkey add on, extending Firefox's highlight to fuzzy search phrase - word highlighting.
- Jake Howlett
- Wed 20 Apr 2005 12:09
Fredrik. I think that's why I don't get to use XML all that much - I rarely get to make one system talk to another...
Jerry. There's some interesting XPath in the Linkify script (turns plain text links in to actual clickable links) which would give us something like this for finding the word "test" in a P (I think):
//text()[(parent::p) and contains(.,'test')]
Search highlighting is an interesting one. Hadn't thought of that. Codestore's search results don't use &highlight= because I use HTML for the view. GM could easily add this parameter back in. Another script could then do the actual highlighting (Domino won't do it as I am using PTHTML!). If only I had the time...
- Jerry Carter
- Tue 17 May 2005 15:19
Another thing I'm finding, IE doesn't support document.evaluate, unfortunately. I was hoping to build some post load page modification into my document to save manually writing HTML into a form. Oh well.
- Bernardo
- Sat 21 May 2005 03:41
Jerry, take a look at
{Link}
The guy there has built a handy js library that provides IE 5+ with the document.evaluate() method. The code (a single .js file) is release under GPL, you can download it from sourceforge.
I haven't played around with it yet, only tested the example html page that comes along with the code. Seems to work fine!
- Joel "Jaykul" Bennett
- Fri 3 Jun 2005 15:09
Glazkov has some cool stuff there, but if you try to use document.evaluate in IE that way, it's horribly slow, compared to the native implementation. I'm not sure what needs to be done to improve that...
- blundith
- Fri 1 Jul 2005 06:07
Em, using xsl as a webdeveloper, almost everyday from the last two years?
Regards :)
- Philippe Lhoste
- Thu 2 Nov 2006 09:33 AM
Interesting. I was looking for a way to convert XPath to more classical JavaScript path: document.forms[1].elements[2] for example. This can be an alternative, but as usual, IE support is dragging behind...
Of related interest: XPather, https://addons.mozilla.org/firefox/1192/
It helps finding a XPath from the Web page.
- randall
- Sun 16 Nov 2008 02:27 AM
i am writing a multi-browser xml lib. i came up with a fix for that to let me get string/number/bool results from IE with xpath queries. if someone knows how to get the current html source from document in IE, i bet this could be modified to work with the current document and applied by IE's behaviors.
if you know how to do xml transoformations in IE you should be able to understand the following snippet. it returns a text output that js can convert. the FF source is there too for comparison:
this.query=function(udef_xpath, udef_resultType){
var result=false;
if(!udef_resultType) var udef_resultType="string";
if( browser.isIE()==true ){
var tmpStyleSheet='';
var p=this.asXML();
var newXML=new erjXMLDocument( p );
newXML.loadXSLDocument( tmpStyleSheet );
var q=newXML.transform();
newXML=null;
switch(udef_resultType.toLowerCase()){
case "number":
result=parseFloat(q);
break;
case "bool":
result=( (q.indexOf("TRUE") > -1) || (q.indexOf("true") > -1) || (q.indexOf("1") > -1) ? true : false );
break;
default:
result=q;
break;
}
}else{
udef_resultType=( udef_resultType.toLowerCase().indexOf("number") > -1 ? XPathResult.NUMBER_TYPE : ( udef_resultType.toLowerCase().indexOf("bool") > -1 ? XPathResult.BOOLEAN_TYPE : XPathResult.STRING_TYPE ) );
result=xmlDocument.evaluate( udef_xpath, xmlDocument, null, udef_resultType, null);
switch(udef_resultType){
case XPathResult.NUMBER_TYPE:
result=result.numberValue;
break;
case XPathResult.BOOLEAN_TYPE:
result=result.booleanValue;
break;
default:
result=result.stringValue;
break;
}
}
return result;
}
--randall
- Jeff G
- Thu 30 Apr 2009 05:20 PM
Great post and great information. Thanks Jake. I have an issue here that you might be able to clarify based on your experience. I built a nice Java Library that encapsulates my code and is reusable in a sister application. I am using R7 and ensured all of my Java code was 1.4 compliant. The problem I am having is that it seems XPath is not part of Domino's build of the JRE. I am able to compile but get XPathFactory runtime errors all over the place. If I include a 4MB jaxp-ri.jar I no longer get errors, but my performance gets killed loading the jar. Is this a problem with my version of domino designer (7.0.1)???...ooops need to upgrade to 7.0.2 anyway. Your thoughts?