Skip Navigation Links

Indexing Windows Live Writer posts

Categories

While googling for something else, I came across a post that pointed out that Windows Live Writer's saved posts aren't being indexed. Well, the contents weren't - only the file properties. Which is odd, because WLW comes with an IFilter - a plugin that exposes the contents of a .wpost file to Windows Search's index.

image

The article mentions that you can fix this by going to the Indexing Options in the control panel (and going to Advanced -> File Types), selecting the wpost extension, and changing the radio button from "Index Properties Only" to "Index Properties and File Contents".

This works, but not as you expect. It's not using the Windows Live Writer IFilter.

When you select "Index Properties Only", the registered filter is removed from the file type. If a file has no filter registered, the indexer will use the system provided "File Properties Filter", which extracts various properties such as filename, size, dates (and maybe the OLE DocFile structured storage properties) but doesn't touch the contents.

Selecting "Index Properties and File Contents" doesn't magically wire up the correct filter. Instead, it registers the "Plain Text Filter", which just extracts as much text out of the file as it can, and then hands it to the indexer as content. You can use it on arbitrary binary files, but it won't understand the file format, so won't be able to output more advanced properties, such as Author, Subject or Perceived Type. If you try to use the advanced search features of explorer to find blog posts with a certain subject, it will fail. Not too much of a hardship, perhaps, because the text will still match the full content search, but by missing the Perceived Type, the indexer doesn't know if it's a document, email, picture, audio, video or whatever. Bang goes your filtering.

We can fix this, but let's see why it wasn't registered in the first place. A great tool to help with this is Citeknet's IFilter Explorer.

 IFilter Explorer - Citeknet

Take a look for the .wpost extension. It's not there. Now we know why the proper filter wasn't being used - it's not registered.

You might have noticed the bewildering array of tabs across the top of the list. Windows Search shares a history with a long line of search products from Microsoft, from server side search engines such as SQL Server full text search, Sharepoint and Exchange search, to desktops, with Windows Search (3.x), Windows Desktop Search (2.x - MSN Desktop Search), Indexing Service and even the aborted WinFS.

On a hunch, check out Windows Desktop Search 2.x.

There it is. The .wpost extension has the WebPostFilter class registered against it.

And that's because despite sharing ancestry and the IFilter technology, registration between the different implementations can be subtly (and not so subtly) different. For example, the SQL Server registration needs extra data in a system table.

There does appear to be a common thread amongst registrations, though, and this is partly described in the docs for the current version of Windows Search. Namely, registration hangs off the file extension in the registry, or off the document type pointed to by the file extension. Or even from the MIME content type (which I didn't know worked, but explains why so many xml files are indexed).

Windows Desktop Search 2.x simply had some overrides that were checked before the system defined places, and the Windows Live Writer developers chose to register it there:

HKLM\SOFTWARE\Microsoft\RSSearch\ContentIndexCommon\Filters\Extension\.wpost

Now we know what the problem is, it's pretty straight forward to fix. We just need to deal with the mind-bogglingly odd way of registering IFilters.

Hanging off the file extension, the document type or the MIME type, you need to add a key called "PersistentHandler". This has a GUID that is stored in HKLM\CLSID. That GUID has a key called PersistentAddinsRegistered, which has another subkey named after the interface IID for IFilter. The default value of this is a CLSID for the IFilter COM object.

Phew.

I have absolutely no idea why they added that bonkers level of abstraction, but it's been there for years, so who are we to argue with tradition. To make it easy, save this as a .reg file and double click:

[HKEY_CLASSES_ROOT\.wpost]

[HKEY_CLASSES_ROOT\.wpost\PersistentHandler]
@="{60734E5A-7C25-479f-B101-F14DEAF5ACB6}"

[HKEY_CLASSES_ROOT\CLSID\{60734E5A-7C25-479f-B101-F14DEAF5ACB6}]
@="Windows Live Writer persistent handler"

[HKEY_CLASSES_ROOT\CLSID\{60734E5A-7C25-479f-B101-F14DEAF5ACB6}
\PersistentAddinsRegistered]

[HKEY_CLASSES_ROOT\CLSID\{60734E5A-7C25-479f-B101-F14DEAF5ACB6}
\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}]
@="{4DFA66FF-1EE1-4BAF-A034-0023FB7372EB}"

[HKEY_CLASSES_ROOT\CLSID\{60734E5A-7C25-479f-B101-F14DEAF5ACB6}
\PersistentHandler]
@="{60734E5A-7C25-479f-B101-F14DEAF5ACB6}"

Note that I've wrapped a couple of lines for legibility. Oh, and that PersistentHandler GUID? Brand new one. Never before used. ({60734...} that is. {89BCB...} is the IID for IFilter and {4DFA6...} is the CLSID of the Windows Live Writer filter).

Advanced Options

Now you just have to get the indexer to re-index those files, and Bob's yer uncle. I took the lazy route, and just rebuilt the whole index (Control Panel -> Indexing Options -> Advanced -> Rebuild).

Painless, eh? What I want to know now, is what does the null filter do?

posted 4/9/2008 4:44:43 PM

 

Comments:

There are no comments to display for this post.

 

Name:  
Url (optional):  
Subject:  
Comment:  

Enter Captcha Validation:
(If you cannot read the Captcha image, press "Reset Image" to generate a new one)