Hopefully by now I've convinced you to take XSS seriously?! If so then you're probably keen to see what the solution is. Let's see shall we.
The basic premise is that all field values submitted as part of your Domino web forms must be inspected for any potential attacks. And I mean all fields. As I showed on Wednesday even hidden and/or computed fields could be vulnerable.
How you filter the field depends on its type. For normal Text fields where you don't expect or want HTML then you need to remove all HTML. This is just a case of replacing all angle brackets (< or >) with either < or >. Job done. For fields where you want to allow HTML entry it gets a whole lot more complicated.
You can either use a "tag whitelist" or "tag blacklist" of HTML tags you either want to allow or remove, respectively. It's probably easier to maintain a whitelist, which might include the following subset of HTML tags:
a, b, blockquote, br, caption, center, col, colgroup, comment, em, font, h3, h4, h5, h6, hr, img, li, ol, p, pre, s, small, span, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul
If you do decide to use a blacklist then here's a taster of the kind of thing to look out for. All of these tags can be used in malice:
link, iframe, frame, frameset, object, param, embed, style, applet, meta, layer, import, xml, script, base
Once you've removed/replaced all the tags you don't want (while deciding what to do with their content) you then have to look at the inner attributes of the tags that remain. Here's a sample of a "attribute whitelist".
abbr, align, alt, background, bgcolor, border, cellpadding, cellspacing, cite, clear, color, cols, direction, face, font-weight, headers, height, href, hspace, leftpadding, loop, noshade, nowrap, point-size, rightpadding, rowspan, size, span, src, summary, target, title, toppadding, type, valign, value, vspace, width, wrap
If you use a blacklist then you need to look for any event-related attribute such as onclick, onmouseover etc. All of which can do considerable harm.
Nothing is ever easy though is it. Even if you've removed the potentially naughty attributes the actual value of the normally safe ones can in turn cause harm. Consider a user entering the following HTML:
For all the attributes you do allow you need to examine their actual values for naughty code. In the example above it's not even as simple as looking for the word "expression" in the value. You first need to strip it of comments and whitespace and make sure your search isn't case-sensitive. It all gets very complicated. Take a look at this extensive list of possible hacks to see just what we're dealing with.
As yet I've not finished work on the code for the filter needed to prevent XSS attacks. Even when I do I don't know if I'd be willing to bet money on it being 100% effective and don't even know if there's a filter out there that is.
What has surprised me is that there doesn't seem to be an existing Java library that will take a String input of HTML and make it safe (while formatting it and correcting errors). You can bet Ruby on Rails and PHP have effective filters, but I just can't find the Java equivalent. While I work on rectifying that you'll just have to wait. I hope you're not a sitting duck in the mean time ;o)
I'm keen to get it done as I'm well aware I need to apply the code in more than a few places on this server!
When I'm happy with the code I'll make it available for testing/download and describe the approach I've taken to implementing its use in Domino in more detail.