Extending Windows Desktop Search

by Matt 30. August 2006 17:33

One thing Microsoft gets is Platforms. Just about everything they do is a platform. Not for them the quick app. Oh no. It has to be extensible.

I'm writing this in Windows Live Writer. It supports plugins. Each Office application supports addins and both internal and external scripting. Explorer is built entirely out of extensions, as is Visual Studio.

Windows Desktop Search is one such platform. Not only is it a rather smashing search engine (and the evolution of the indexing service found way back in Windows NT 4) but it's also incredibly extensible.

There are three and a half ways of extending WDS. Firstly, you can create a COM object that implements the IFilter interface. This is how WDS can understand and index different file formats. And, if the item to be indexed isn't a file on a file system, you can create an object that implements ISearchProtocol and it's related interfaces. And finally, you can implement a preview for your file type.

(The half that's left over is a really interesting one, and something I intend to write up at some point - WDS will use any property handlers you have associated with a file type. This is something that is seriously under-documented (look for the line marked metadata handler), and looks like it's changed drastically under Vista.)

Now, I've got a plan. For a while now, I've wanted to change my feed reader from Sharpreader to, well, just about anything else to be honest (but that's for another post). IE7 came along with the Windows Feed Platform (another platform - see what I mean?) and so I've been meaning to migrate my feeds over here. This will happen in due course, but this and WDS together got me thinking - and not just me, either. Brandon LeBlanc asked the question I thought of, as did some unnamed wiki editor.

I'm going to try and write a search protocol for IE7's RSS feed platform, and just to make it more fun, I intend to blog each step in the process. We'll start at the very beginning - getting the project setup and figuring out where the documentation is.


Windows Desktop Search

More SubV2 changes

by Matt 29. August 2006 17:36

Yeah, a little more hacking (and a quick post as means of a test). The RSS feed is now outputting categories, and the author now uses the logon name (which happens to be my name) rather than the site name. I've also embedded the trackback ping url in there.

It's also getting a little hard to test the rss feed - I've subscribed to it in IE7, so when I navigate there, I actually get the new feed platform's view of my feed, which means it's all re-parsed and not the actual xml that I return. As a top tip, you can just go into notepad, go to File -> Open and type a web address, such as http://sticklebackplastic.com/Rss.ashx and wait a second or two and notepad will now open the item from IE's cache. That's a lovely little feature of the common dialog.



FxCop spell check language

by Matt 24. August 2006 16:02

We had discontent on the team today. Can you believe people actually dissing FxCop? Turns out they were just upset about the spelling - it's making us spell in American English, rather than British English.

I can feel the pain - it still makes me wince whenever I see "Favorites". And it's just wrong to spell it "color".

Easily fixed - simply change the spell language in the project options dialog.

Or, if you never use the GUI (e.g. if you've gone continuous integration crazy) then it's in the .fxcop file. It's a normal .net locale; "en-gb", "en-us" or just plain "en". (//FxCopProject/ProjectOptions/Spelling[@Locale])

One thing that isn't surfaced in the UI is that you can specify multiple locales to check. Simply provide a comma separated list of locales. FxCop will try to match each word against the first in the list. If it doesn't match, it moves on to the next locale. Handy.

Thanks Reflector!


.net tools | FxCop

You snooze, you lose

by Matt 22. August 2006 17:37

You've got to love internet time.

I've downloaded the Windows Live Writer SDK, grabbed a copy of Wilco Bauwer's syntax highlighter and was just beginning to figure out how to write a nice insert-syntax-highlighted-code-plugin, when someone beat me to it.

In fact, I'm well behind on this one. Check out JefTek's Live Space entry - four syntax highlighting plugins there. I'll have to get downloading.

And I still need to integrate a highlighter with the editor built into the site...


Officially a blog

by Matt 17. August 2006 16:57
Yep. I'm proud to say I've had my first piece of comment spam (and indeed, my first comment):

# xaqmgpksh orjiwd
posted by yrizuq sijep on 16/08/2006 22:40:18 :

dwpt exwjrbk atuj thzuyeqsw qscuiz yrsqxku clztf

It's gibberish and the website doesn't exist, but it's contact.


Windows Live Writer progress

by Matt 17. August 2006 16:54

Me, yesterday:

So, bugs. [...] Another biggie is editing a post and hitting publish create a new post, rather than updating the original.


And categories work too, and the time no longer resets to 01/01/0001. Trackbacks don't though - they currently only work for initially saved posts from the web site. Fixing this one will require moving files around between assemblies.

I'm really getting to like Windows Live Writer. I've even had my first idea for a plugin...



Windows Live Writer + SubV2 = Not bad, really

by Matt 16. August 2006 17:43

So, as previously noted, I've been trying to get Windows Live Writer working nicely with SubV2. Once I realised that the MetaWeblog API is actually plumbed in, I tried it all out. An hour or two of tweaks to the code later, and it's working rather nicely.

There were a couple of problems. Null data in the XMLRPC structs were not being ignored, so an exception was thrown. Also, posts weren't getting created because the security was checking that you were logged in on the website, and well, you aren't. For now, I've just commented out the check (not in the least bit dodgy!)

I've even added RSD support, so that WLW can just look at the homepage and auto-configure everything. Super.

The RSD support comes from a new http handler Rsd.ashx. Unfortunately, the xml content is written from the code. I'm not sure how to write markup in an ashx file - I just got compile errors. I presume you could wrap everything in asp.net style controls, and take it from there. (I also need to add output caching. Add another todo to the list)

Part of the wizard tries to create a post, gets the homepage and works out the styles so that it can provide a web page preview. This fails. Creating the post is fine, but the fetch of the page doesn't work out too well. SubV2 makes heavy use of caching, but some of the cache code is very broken. I've already fixed a bug where cached items don't expire (the value in the web.config is never passed in - evidence of this is that I posted a comment on one of Darren's post two days ago, and it still hasn't appeared!) and I've cleaned up a lot of issues where the cache wasn't invalidated when posts and categories were created, deleted or edited. One I haven't sorted out is invalidating the list of posts - this doesn't get invalidated. Which means that the new post created by WLW doesn't get displayed on the homepage, and the whole auto-discovery process falls on it's face. Quickest way to fix? Set the cache timeout to 10 seconds.

So, bugs. The most obscure I've found so far is if you're highlighting text with the cursor keys + shift, you're at the end of the file, you go left, you can't de-select text by going right. Another biggie is editing a post and hitting publish create a new post, rather than updating the original. And when you get the latest posts, they don't get persisted anywhere, which is a shame.

And SubV2 doesn't support the newMediaObject method, so no uploading pictures.

But all in? I rather like it. Beats editing in FreeTextBox on a web page that times out after 10 minutes. Just got to fix some of those bugs...



Spot the working code

by Matt 16. August 2006 17:16

It's like spot the bug, only the code actually works.

Like everyone else and their dog, I decided to try out Windows Live Writer. But to do that, I needed my blog to support some sort of posting API. My blog software (which, as I'm sure you'll remember, is SubV2) supposedly has support for the MetaWeblog API. There's a file called MetaBlog.ashx, but opening it up in Visual Studio showed the following:

<%@ WebHandler Language="C#" Class="BlueFenix.MetaWeblog.MetaBlog" %>

using System;
using System.Web;

public class MetaBlog : IHttpHandler {
public void ProcessRequest (HttpContext context) {
context.Response.ContentType = "text/plain"
context.Response.Write("Hello World");
public bool IsReusable {
get {
return false;


This lead me to add "wire up metaweblog stuff" to my list of things To Do to SubV2 (number 15 of 69, if you're counting). Looks fairly boilerplate and empty, right?

"Hello world". Sheesh.

So, imagine my surprise when I actually navigated to MetaBlog.ashx, and I got a proper page back. How could this "Hello world" http handler actually do anything?

The secret is in the Class attribute on the first line. It's the actual type used for the http handler. Normally, it's named after the class that's also in the body of the ashx file, which gets compiled and then used. Here's it's different. The class in the ashx file gets compiled and thrown away, and the BlueFenix.MetaWeblog.MetaBlog class is used itself. This class happens to be the one that implements IHttpHandler, and the rest of the MetaWeblog API.

The devil really is in the details.

A better solution for this already exists - instead of having a virtually empty MetaBlog.ashx file, add the following into the web.config in the <httpHandlers> element:

<add verb="*" path="MetaBlog.ashx" type="BlueFenix.MetaWeblog.MetaBlog" />

I can now delete the MetaBlog.ashx file. Any requests for MetaBlog.ashx will get mapped from the web.config to the required type.


Scripts in feeds

by Matt 13. August 2006 17:02

There's a bit of a storm in a teacup going on at the moment about potential security problems with javascript in RSS feeds. This is not a new problem - Mark Pilgrim demonstrated and wrote about this in 2003. It's a shame that it's still an issue (Apple, Linux, Firefox, take note - security is very hard to get right).

I've always wondered about Mark's advice. The idea is to strip potentially dodgy tags, such as script, embed, object and iframe. The theory is simple, if you don't have a script tag, you can't get hit by a dodgy script. The downside is that I also can't run that really useful script. It's not discerning enough. And it just feels too heavy handed - surely there's a more elegant approach?

Many aggregators make use of Internet Explorer's web browser control to display their content. IE supports multiple zones for security - local, intranet, internet and explicitly listed trusted and restricted sites. It knows all these based on the URL. And it allows or disallows actions based on what zone you're in. Getting the idea yet?

When you host the web browser control and you want it to display arbitrary HTML, you get it to browse to a temporary file, or squirt the HTML in through a stream. In either case, the browser won't know what the proper URL should be, and will use the wrong security zone. I don't know which one it would use - a default of restricted would be nice, but it's probably going to be local - after all, if you're injecting HTML in, then something's already running on your local machine, and running arbitrary script is the least of your worries. Assuming it's the most secure, then you don't need to do anything - IE won't run dodgy scripts or ActiveX objects. If it's the most open, then you have a problem.

My aggregator would simply set the security zone of the web browser control to the URL of the post - and let IE handle the rest (note that I don't know how to do this. There is an API that lets you handle custom URLs, but I don't know if you can specify a normal HTTP URL for an in memory stream. One way or another, I'm sure you could get this to work). Now the viewed feed item can't do anything you wouldn't allow the site to do. And you haven't had to strip anything out, either.

IFrames might provide a small issue, but I suspect they'd work fine because they'd link to a normal HTTP address, and the default URL security manager would kick in and all would be good.

Comments are one area where there could actually be a problem. You're now only as secure as the site the feed is coming from. If it was in the trusted zone, and allowed any old user to add any old script in it's comments, then you're pretty much done for. Defense in depth would argue that my aggregator should not allow this situation to happen, which might throw a spanner in my otherwise lovely theoretical design.

Looking around, I can see that, along with stripping dodgy tags, FeedDemon displays in the local zone and the latest version of SharpReader displays feeds in the restricted zone.

IE7's new feed platform pretty much does the same - strips tags and displays in restricted (and they do recommend implementing a custom security manager for viewing feeds as well). I am a little surprised they don't lean a little more heavily on IE's security platform for any of this, though. But, it does kind of make sense, too. Since they are an OS wide platform, they need to be as secure as possible, and probably shouldn't leave the security up to someone else. What if you use another browser to render the HTML? What if you implement the IE security manager stuff wrong?

I still think it's a nice idea, and it'd be interesting to implement and see if it actually worked.


Living the Test Driven Dream

by Matt 6. August 2006 18:23

I find myself in an odd situation.

I'm doing maintenance programming.

And I'm enjoying it.

Maintenance programming sucks. It's the crazy art of taking someone else's code, guessing what it does, closing your eyes, gritting your teeth and changing something that inevitably breaks as soon as you get it live. It's a dangerous place, fraught with peril. It's depressing and demotivating and dull. Everyone has to do it at some point, and everyone hates it.

But I've made a discovery. Maintenance programming doesn't suck; it's not the work itself that's so bad, it's how you do it. Or rather, it's how it makes you feel.

Look at that description of maintenance programming again. OK, it's somewhat exaggerated and has a tongue planted firmly in it's cheek, but it still holds true. Developers do hate it, and get demotivated and depressed and demoralised. And here's the key - look at all that negative emotion! Things tend to go wrong when you do maintenance programming. And you're looking at other people's code and you don't like how it's structured, and you don't understand it, and it's all just so much stress and fear and general bad vibes.

This emotional impact shouldn't be under-estimated. This stress and fear is directly contributing to the depressed and demoralised feeling. And it's dull because you can't change the structure to something nicer because you're too frightened of breaking things! Who really wants to work in these conditions?

That's right. Maintenance programming is rubbish because of the emotional impact it has on you.

The good news is that this can all be fixed, and very easily. All it needs is some proper unit tests. And doing things test driven is even better.

I've had the luxury of spending the last couple of years working almost exclusively in .net, and there I've been really rather strict about unit testing, and test driven development (not an easy skill to learn, but once things have clicked, there's no going back). But that was all new code. Or so I thought at the time. Looking back it's fairly obvious that I was doing a lot of maintenance programming then, too. I haven't been writing brand spanking new code every single day for the past couple of years! I was changing things and tweaking things and fixing things, just like maintenance programming. But it didn't feel like it. There was no fear, no stress, no depression, no demotivation. And that's because I had a huge safety net of unit tests. Since the tests had been there before the code was even written, this was just taken for granted, and changes were cheap and safe.

But what if the tests aren't there already? This is where Michael Feathers becomes your new best friend. Go get his book Working Effectively with Legacy Code. It deals entirely with this situation, and has guidance and advice for what to do in this situation. It defines legacy code as code without tests. And the basic rule of the book is that you don't touch a single line of existing code until you've got it covered in tests. The hard part is inserting tests where they weren't designed to go, and that's where a large amount of the book comes in handy.

My current project came with some "prototype" code - a single 5000 line C file written by someone else. I had to get this production ready. Are you feeling the fear yet? Me too.

There was only one thing for it. Unit testing in C++. I'm now using Eclipse with a rather nice CxxTest plugin. I've got the full red-green-refactor cycle going, loads of unit tests, and it's all working really rather well.

And the most surprising thing was how it all made me feel.

I'm actually enjoying maintenance programming. I'm not worried about breaking something, because I've got confidence in my tests. I'm not worried about lack of knowledge, because I just write a test that demonstrates how it works. And, even more positively, it's incredibly satisfying to keep the red/green bar green, and to watch the number of unit tests increase, knowing that the quality and maintainability of the project is increasing with every new test. Instead of being limited by negativity, I've got all the fun of deciding how to best improve the structure of the code.

Who would have thought coding could be quite this emotional?



Month List


Comment RSS