I've always wondered about Mark's advice. The idea is to strip potentially dodgy tags, such as script, embed, object and iframe. The theory is simple, if you don't have a script tag, you can't get hit by a dodgy script. The downside is that I also can't run that really useful script. It's not discerning enough. And it just feels too heavy handed - surely there's a more elegant approach?
Many aggregators make use of Internet Explorer's web browser control to display their content. IE supports multiple zones for security - local, intranet, internet and explicitly listed trusted and restricted sites. It knows all these based on the URL. And it allows or disallows actions based on what zone you're in. Getting the idea yet?
When you host the web browser control and you want it to display arbitrary HTML, you get it to browse to a temporary file, or squirt the HTML in through a stream. In either case, the browser won't know what the proper URL should be, and will use the wrong security zone. I don't know which one it would use - a default of restricted would be nice, but it's probably going to be local - after all, if you're injecting HTML in, then something's already running on your local machine, and running arbitrary script is the least of your worries. Assuming it's the most secure, then you don't need to do anything - IE won't run dodgy scripts or ActiveX objects. If it's the most open, then you have a problem.
My aggregator would simply set the security zone of the web browser control to the URL of the post - and let IE handle the rest (note that I don't know how to do this. There is an API that lets you handle custom URLs, but I don't know if you can specify a normal HTTP URL for an in memory stream. One way or another, I'm sure you could get this to work). Now the viewed feed item can't do anything you wouldn't allow the site to do. And you haven't had to strip anything out, either.
IFrames might provide a small issue, but I suspect they'd work fine because they'd link to a normal HTTP address, and the default URL security manager would kick in and all would be good.
Comments are one area where there could actually be a problem. You're now only as secure as the site the feed is coming from. If it was in the trusted zone, and allowed any old user to add any old script in it's comments, then you're pretty much done for. Defense in depth would argue that my aggregator should not allow this situation to happen, which might throw a spanner in my otherwise lovely theoretical design.
Looking around, I can see that, along with stripping dodgy tags, FeedDemon displays in the local zone and the latest version of SharpReader displays feeds in the restricted zone.
IE7's new feed platform pretty much does the same - strips tags and displays in restricted (and they do recommend implementing a custom security manager for viewing feeds as well). I am a little surprised they don't lean a little more heavily on IE's security platform for any of this, though. But, it does kind of make sense, too. Since they are an OS wide platform, they need to be as secure as possible, and probably shouldn't leave the security up to someone else. What if you use another browser to render the HTML? What if you implement the IE security manager stuff wrong?
I still think it's a nice idea, and it'd be interesting to implement and see if it actually worked.