Documentation for search protocol handlers

by Matt 31. August 2006 14:14

Well, I've found some more documentation. And it's got me a bit nervous.

The IFilter stuff is fine. I don't think that's changed at all since it was first created for NT4.

The problem is with the protocol handlers.

The MSN Toolbar docs point to this documentation for Sharepoint Portal Server SDK 2001. It, of course, talks about everything from a Sharepoint Portal Server 2001 point of view. And I of course have absolutely no idea what's going on. It's written from the point of view that you already know what each part of the system is, what they do, and how they talk. It's also 5 years old. Still, it links to the reference info for the required interfaces, so I guess it's a good start. There's even a sample, which seems to be the only place you can get the header files for ISearchProtocol.

Getting a little more modern, there's a slightly expanded version of the same docs for SharePoint Portal Server 2003 - only 3 years out of date. It has a nice little troubleshooting section, and provides a bit more detail on a couple of the interfaces to implement. There's also a bit of a reference section. But we're still talking SharePoint, not WDS.

Then there's the MSN Toolbar docs themselves. Again, it's all a bit everything-you-need-without-any-detail. But it does detail how you register a protocol handler (without telling you how to get the ISearchManger interface). The big problem is that it's for Windows Desktop Search when it was still called the MSN Desktop Search. I'm presuming the registration still works...

Things get really interesting when you go to the Windows Shell MSDN library page. Looking in the tree view on the left you'll see nodes for WDS 2.x and WDS 3.0. Only the 2.x version has a landing page. It's described as preliminary documentation, and only talks about IFilter. The interfaces listed in the reference section aren't terribly useful, either.

The WDS 3.0 section only has reference information, and lists 39 interfaces, a couple of which are also mentioned in the Sharepoint docs. Unfortunately, it's ISearchManager interface has completely different methods to the MSN Toolbar interface of the same name, and offers no way of registering protocols handlers. It looks like WDS has had quite an overhaul for version 3/Vista. This post on MSDN's forums seems to confirm that there's quite a difference.

And that's what's making me nervous. I've currently got WDS 3.0 Beta 2 running, but I think I'd be happier developing towards the 2.x version - because it looks more like the SharePoint stuff that I've got a sample for. Plus, I don't have the headers for the Vista stuff - I don't fancy downloading a beta version of the Platform SDK, especially since it contains the beta WinFx stuff.

(And it looks like I've got an out-of-date PSDK as it is. Here's the link to the Windows Server 2003 R2 Platform SDK - March 2006 Edition.)

So, first plan of action is to uninstall 3.0 and reinstall the latest 2.x version. Then perhaps we can get started with some actual dev work!


Windows Desktop Search

Extending Windows Desktop Search

by Matt 30. August 2006 17:33

One thing Microsoft gets is Platforms. Just about everything they do is a platform. Not for them the quick app. Oh no. It has to be extensible.

I'm writing this in Windows Live Writer. It supports plugins. Each Office application supports addins and both internal and external scripting. Explorer is built entirely out of extensions, as is Visual Studio.

Windows Desktop Search is one such platform. Not only is it a rather smashing search engine (and the evolution of the indexing service found way back in Windows NT 4) but it's also incredibly extensible.

There are three and a half ways of extending WDS. Firstly, you can create a COM object that implements the IFilter interface. This is how WDS can understand and index different file formats. And, if the item to be indexed isn't a file on a file system, you can create an object that implements ISearchProtocol and it's related interfaces. And finally, you can implement a preview for your file type.

(The half that's left over is a really interesting one, and something I intend to write up at some point - WDS will use any property handlers you have associated with a file type. This is something that is seriously under-documented (look for the line marked metadata handler), and looks like it's changed drastically under Vista.)

Now, I've got a plan. For a while now, I've wanted to change my feed reader from Sharpreader to, well, just about anything else to be honest (but that's for another post). IE7 came along with the Windows Feed Platform (another platform - see what I mean?) and so I've been meaning to migrate my feeds over here. This will happen in due course, but this and WDS together got me thinking - and not just me, either. Brandon LeBlanc asked the question I thought of, as did some unnamed wiki editor.

I'm going to try and write a search protocol for IE7's RSS feed platform, and just to make it more fun, I intend to blog each step in the process. We'll start at the very beginning - getting the project setup and figuring out where the documentation is.


Windows Desktop Search

More SubV2 changes

by Matt 29. August 2006 17:36

Yeah, a little more hacking (and a quick post as means of a test). The RSS feed is now outputting categories, and the author now uses the logon name (which happens to be my name) rather than the site name. I've also embedded the trackback ping url in there.

It's also getting a little hard to test the rss feed - I've subscribed to it in IE7, so when I navigate there, I actually get the new feed platform's view of my feed, which means it's all re-parsed and not the actual xml that I return. As a top tip, you can just go into notepad, go to File -> Open and type a web address, such as and wait a second or two and notepad will now open the item from IE's cache. That's a lovely little feature of the common dialog.



FxCop spell check language

by Matt 24. August 2006 16:02

We had discontent on the team today. Can you believe people actually dissing FxCop? Turns out they were just upset about the spelling - it's making us spell in American English, rather than British English.

I can feel the pain - it still makes me wince whenever I see "Favorites". And it's just wrong to spell it "color".

Easily fixed - simply change the spell language in the project options dialog.

Or, if you never use the GUI (e.g. if you've gone continuous integration crazy) then it's in the .fxcop file. It's a normal .net locale; "en-gb", "en-us" or just plain "en". (//FxCopProject/ProjectOptions/Spelling[@Locale])

One thing that isn't surfaced in the UI is that you can specify multiple locales to check. Simply provide a comma separated list of locales. FxCop will try to match each word against the first in the list. If it doesn't match, it moves on to the next locale. Handy.

Thanks Reflector!


.net tools | FxCop

You snooze, you lose

by Matt 22. August 2006 17:37

You've got to love internet time.

I've downloaded the Windows Live Writer SDK, grabbed a copy of Wilco Bauwer's syntax highlighter and was just beginning to figure out how to write a nice insert-syntax-highlighted-code-plugin, when someone beat me to it.

In fact, I'm well behind on this one. Check out JefTek's Live Space entry - four syntax highlighting plugins there. I'll have to get downloading.

And I still need to integrate a highlighter with the editor built into the site...


Officially a blog

by Matt 17. August 2006 16:57
Yep. I'm proud to say I've had my first piece of comment spam (and indeed, my first comment):

# xaqmgpksh orjiwd
posted by yrizuq sijep on 16/08/2006 22:40:18 :

dwpt exwjrbk atuj thzuyeqsw qscuiz yrsqxku clztf

It's gibberish and the website doesn't exist, but it's contact.


Windows Live Writer progress

by Matt 17. August 2006 16:54

Me, yesterday:

So, bugs. [...] Another biggie is editing a post and hitting publish create a new post, rather than updating the original.


And categories work too, and the time no longer resets to 01/01/0001. Trackbacks don't though - they currently only work for initially saved posts from the web site. Fixing this one will require moving files around between assemblies.

I'm really getting to like Windows Live Writer. I've even had my first idea for a plugin...



Windows Live Writer + SubV2 = Not bad, really

by Matt 16. August 2006 17:43

So, as previously noted, I've been trying to get Windows Live Writer working nicely with SubV2. Once I realised that the MetaWeblog API is actually plumbed in, I tried it all out. An hour or two of tweaks to the code later, and it's working rather nicely.

There were a couple of problems. Null data in the XMLRPC structs were not being ignored, so an exception was thrown. Also, posts weren't getting created because the security was checking that you were logged in on the website, and well, you aren't. For now, I've just commented out the check (not in the least bit dodgy!)

I've even added RSD support, so that WLW can just look at the homepage and auto-configure everything. Super.

The RSD support comes from a new http handler Rsd.ashx. Unfortunately, the xml content is written from the code. I'm not sure how to write markup in an ashx file - I just got compile errors. I presume you could wrap everything in style controls, and take it from there. (I also need to add output caching. Add another todo to the list)

Part of the wizard tries to create a post, gets the homepage and works out the styles so that it can provide a web page preview. This fails. Creating the post is fine, but the fetch of the page doesn't work out too well. SubV2 makes heavy use of caching, but some of the cache code is very broken. I've already fixed a bug where cached items don't expire (the value in the web.config is never passed in - evidence of this is that I posted a comment on one of Darren's post two days ago, and it still hasn't appeared!) and I've cleaned up a lot of issues where the cache wasn't invalidated when posts and categories were created, deleted or edited. One I haven't sorted out is invalidating the list of posts - this doesn't get invalidated. Which means that the new post created by WLW doesn't get displayed on the homepage, and the whole auto-discovery process falls on it's face. Quickest way to fix? Set the cache timeout to 10 seconds.

So, bugs. The most obscure I've found so far is if you're highlighting text with the cursor keys + shift, you're at the end of the file, you go left, you can't de-select text by going right. Another biggie is editing a post and hitting publish create a new post, rather than updating the original. And when you get the latest posts, they don't get persisted anywhere, which is a shame.

And SubV2 doesn't support the newMediaObject method, so no uploading pictures.

But all in? I rather like it. Beats editing in FreeTextBox on a web page that times out after 10 minutes. Just got to fix some of those bugs...



Spot the working code

by Matt 16. August 2006 17:16

It's like spot the bug, only the code actually works.

Like everyone else and their dog, I decided to try out Windows Live Writer. But to do that, I needed my blog to support some sort of posting API. My blog software (which, as I'm sure you'll remember, is SubV2) supposedly has support for the MetaWeblog API. There's a file called MetaBlog.ashx, but opening it up in Visual Studio showed the following:

<%@ WebHandler Language="C#" Class="BlueFenix.MetaWeblog.MetaBlog" %>

using System;
using System.Web;

public class MetaBlog : IHttpHandler {
public void ProcessRequest (HttpContext context) {
context.Response.ContentType = "text/plain"
context.Response.Write("Hello World");
public bool IsReusable {
get {
return false;


This lead me to add "wire up metaweblog stuff" to my list of things To Do to SubV2 (number 15 of 69, if you're counting). Looks fairly boilerplate and empty, right?

"Hello world". Sheesh.

So, imagine my surprise when I actually navigated to MetaBlog.ashx, and I got a proper page back. How could this "Hello world" http handler actually do anything?

The secret is in the Class attribute on the first line. It's the actual type used for the http handler. Normally, it's named after the class that's also in the body of the ashx file, which gets compiled and then used. Here's it's different. The class in the ashx file gets compiled and thrown away, and the BlueFenix.MetaWeblog.MetaBlog class is used itself. This class happens to be the one that implements IHttpHandler, and the rest of the MetaWeblog API.

The devil really is in the details.

A better solution for this already exists - instead of having a virtually empty MetaBlog.ashx file, add the following into the web.config in the <httpHandlers> element:

<add verb="*" path="MetaBlog.ashx" type="BlueFenix.MetaWeblog.MetaBlog" />

I can now delete the MetaBlog.ashx file. Any requests for MetaBlog.ashx will get mapped from the web.config to the required type.


Scripts in feeds

by Matt 13. August 2006 17:02

There's a bit of a storm in a teacup going on at the moment about potential security problems with javascript in RSS feeds. This is not a new problem - Mark Pilgrim demonstrated and wrote about this in 2003. It's a shame that it's still an issue (Apple, Linux, Firefox, take note - security is very hard to get right).

I've always wondered about Mark's advice. The idea is to strip potentially dodgy tags, such as script, embed, object and iframe. The theory is simple, if you don't have a script tag, you can't get hit by a dodgy script. The downside is that I also can't run that really useful script. It's not discerning enough. And it just feels too heavy handed - surely there's a more elegant approach?

Many aggregators make use of Internet Explorer's web browser control to display their content. IE supports multiple zones for security - local, intranet, internet and explicitly listed trusted and restricted sites. It knows all these based on the URL. And it allows or disallows actions based on what zone you're in. Getting the idea yet?

When you host the web browser control and you want it to display arbitrary HTML, you get it to browse to a temporary file, or squirt the HTML in through a stream. In either case, the browser won't know what the proper URL should be, and will use the wrong security zone. I don't know which one it would use - a default of restricted would be nice, but it's probably going to be local - after all, if you're injecting HTML in, then something's already running on your local machine, and running arbitrary script is the least of your worries. Assuming it's the most secure, then you don't need to do anything - IE won't run dodgy scripts or ActiveX objects. If it's the most open, then you have a problem.

My aggregator would simply set the security zone of the web browser control to the URL of the post - and let IE handle the rest (note that I don't know how to do this. There is an API that lets you handle custom URLs, but I don't know if you can specify a normal HTTP URL for an in memory stream. One way or another, I'm sure you could get this to work). Now the viewed feed item can't do anything you wouldn't allow the site to do. And you haven't had to strip anything out, either.

IFrames might provide a small issue, but I suspect they'd work fine because they'd link to a normal HTTP address, and the default URL security manager would kick in and all would be good.

Comments are one area where there could actually be a problem. You're now only as secure as the site the feed is coming from. If it was in the trusted zone, and allowed any old user to add any old script in it's comments, then you're pretty much done for. Defense in depth would argue that my aggregator should not allow this situation to happen, which might throw a spanner in my otherwise lovely theoretical design.

Looking around, I can see that, along with stripping dodgy tags, FeedDemon displays in the local zone and the latest version of SharpReader displays feeds in the restricted zone.

IE7's new feed platform pretty much does the same - strips tags and displays in restricted (and they do recommend implementing a custom security manager for viewing feeds as well). I am a little surprised they don't lean a little more heavily on IE's security platform for any of this, though. But, it does kind of make sense, too. Since they are an OS wide platform, they need to be as secure as possible, and probably shouldn't leave the security up to someone else. What if you use another browser to render the HTML? What if you implement the IE security manager stuff wrong?

I still think it's a nice idea, and it'd be interesting to implement and see if it actually worked.



Month List


Comment RSS