Thursday, November 17, 2005

Cross Site Scripting and You

In this era buzzing of "web 2.0", "ajax" and social networks, more Internet users no-longer passively "surf" or "browse" the web, they increasingly contribute to it in online forums, portals, aggregators and blogs.

The concept of cross-site-scripting (XSS) has been around for quite a while, fun was had, holes were plugged. Yet, once a while, as i troll around some open social network, i still see a few vulnerabilities crop up here and there.

Putting XSS back on our collective radar can't hurt.

In not-too-nerdy terms, some of the sites most vulnerable to XSS are sites which allow users to contribute richly-formatted content. The concept of a "user" is also key, because a user account's integrity could get compromised by an XSS vulnerability.

For more info, the Wikipedia article has the meat. See also their related vulnerabilities at the bottom. I wonder whether the whole HTTP TRACE vulnerability was ever plugged in IE/Mozilla?

On a nerdier note, are there free/open-source libraries in various application platforms such as Java, PHP, Python, Ruby that handle various forms of HTML content parsing and harmful markup/scripting filtering? The tried-and-true Tidy by Raggett sure helps as a foundation.

Here are a few of the things I would try to look out for, when allowing any foreign markup to make its way onto my site.:

1) filter out all <script...> ... </script>
2) filter out all event handler html attributes from all html tags. Such attribute always starts with the word "on". "onmouseover". "onload". "onclick".
3) filter out all instances of the word "javascript:" in all HTML attribute values. It's otherwise possible to get funky with "javascript:" URIs.
4) i would also filter out <link .../> and <style /> tags. I've heard of a "javascript:" URI used as the value of a "background" url directive, that's just nasty. If you really want to allow CSS styling, let them do it inline with a good old "style" attribute. If they use get funky with javascript:, 3) ought to catch it.
5) to be on the safer side, and to avoid annoyances, i'd also remove all basic html document constructs such as "html" "body" "head" "title", and all complex object embedding constructs such as "object" and "embed".
6) and ensure the resulting html snippet remains clean, valid html.

... what am i leaving out?

1 comment:

Solomon said...

Nice blog. keep up the good work. i sure will regularly be coming back. You can check out my blog
Solomon's voip world devoted to voip technology and internet telephony.