Implementing IUrlAccessor

by matt 27. November 2006 15:35

So, if you tilt your head, squint a little bit and use just a dash of imagination, you can say that ISearchProtocol::CreateAccessor is kind of analagous to CreateFile - it abstracts away the access to what the url refers to just as CreateFile abstracts away the accessing of the file system and the hard disk.

But while CreateFile gives you a handle you can pass in to other API functions, CreateAccessor returns back an instance of the IUrlAccessor interface.

And this is where the whole CreateFile analogy breaks down somewhat. Didn't last long, really, did it? IUrlAccessor is not intended to be an equivalent file system API for a url protocol. Instead, it's about getting access to the any of the url's data that's required for indexing - "file" metadata (size, last modified date), security data and actual "file" contents. (When I'm saying "file" it's really just shorthand for "resource referred to by the url passed to CreateAccessor".)

This simple metadata is available directly from the interface (GetSize, GetLastModified, GetSecurityDescriptor), but getting at the contents is a bit more work.

Obviously, the indexer cannot know about the format of all "files" it's asked to index. Especially when you consider that some files contain just content (such as plain text), some contain only metadata (such as mp3 files) and some contain both (e.g. Word files). We need another layer of abstraction. And that's where IFilter comes in.

The IFilter interface is called by the indexer to retrieve content and metadata from the underlying data source (file/url). The primary purpose of IUrlAccessor is to retrieve an IFilter for the resource represented by the url. So let's take a closer look at IUrlAccessor:

interface IUrlAccessor: IUnknown
{
    HRESULT AddRequestParameter([in] PROPSPEC *pSpec,
                                [in] PROPVARIANT *pVar);
    HRESULT GetDocFormat([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszDocFormat[],
                         [in] DWORD dwSize,
                         [out] DWORD *pdwLength);
    HRESULT GetCLSID([out] CLSID *pClsid);
    HRESULT GetHost([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszHost[],
                    [in] DWORD dwSize,
                    [out] DWORD *pdwLength);
    HRESULT IsDirectory();
    HRESULT GetSize([out] ULONGLONG *pllSize);
    HRESULT GetLastModified([out] FILETIME *pftLastModified);
    HRESULT GetFileName([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszFileName[],
                        [in] DWORD dwSize,
                        [out] DWORD *pdwLength);
    HRESULT GetSecurityDescriptor([out, size_is(dwSize)] BYTE *pSD,
                                  [in] DWORD dwSize,
                                  [out] DWORD *pdwLength);
    HRESULT GetRedirectedURL([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszRedirectedURL[],
                             [in] DWORD dwSize,
                             [out] DWORD *pdwLength);
    HRESULT GetSecurityProvider([out] CLSID *pSPClsid);
    HRESULT BindToStream([out] IStream **ppStream);
    HRESULT BindToFilter([out] IFilter **ppFilter);
};

It's a bit of an odd interface, really - you're actually not expected to implement all of it. Methods that don't make sense for your implemented should return E_NOTIMPL.

There are a number of methods that aren't used - AddRequestParameter, GetHost and GetSecurityProvider. The simple metadata methods are pretty much self explanatory - GetSize, GetLastModified and GetSecurityDescriptor (although this last one will need investigating a bit more closely). The rest are all about getting an IFilter.

When the indexer is indexing the file system, the IFilter is selected based on file extension. When indexing via IUrlAccessor, there are more interesting things to take into account, and IUrlAccessor allows you to customise this simple file extension mapping. Remember that if you don't need this flexibility, you can just return E_NOTIMPL. Also note that the docs don't give an order in which these methods are called - I've listed them here in fairly random order:

  • GetCLSID allows you to return back a class Id that can handle this file type (such as Microsoft Word). I'm guessing this is to do with ActiveDocuments? The main purpose for this is to be able to have a url such as (http://example.org/wordfile.file) actually be a Word file without having to have a .doc extension.
  • GetDocFormat allows you to specify a MIME type that takes precedence over the url's extension.
  • If your url scheme just happens to map UNC accessable files to urls, you can just return the file name here, and it'll get indexed the same as file system files.
  • BindToStream allows you to provide a stream over your data. The indexer can then read the file contents from the stream and either save them to a temporary file and bind an IFilter to that file, or bind the IFilter directly to the stream.
  • If none of those methods suit and you want to take complete control of hooking up the IFilter or if the data represented by your url isn't a normal desktop file format (such as a row in a database, or, as in our case, an RSS item), you can return your own IFilter implementation from BindToFilter.

The final two methods alter how the indexing occurs - GetRedirectedURL and IsDirectory.

GetRedirectedURL allows you to return the actual url that should be used while indexing. In other words, if you have a document at a url that gets redirected, this allows you to tell the indexer that a) it's been redirected, and b) any relative links that your IFilter emits are to be resolved against the redirected url. I don't know if this causes the previously stored url to be updated.

IsDirectory tells the indexer that the current url represents a directory. Surprising that. This means the indexer will treat any emitted child urls as being in this folder. Handy for using the "in:" and "under:" search syntax. (Think of searching in Outlook - "in:Inbox", "in:myfolder", "under:trash").

So that's IUrlAccessor. Doesn't look too tricky. I think the next thing to look at will be returning links from IFilter - this is how we're going to crawl the whole store. Then it'll have to be how to represent the RSS feed store as urls. Hopefully then I'll be able to get at some code - although threading might rear it's ugly head...

Tags:

Windows Desktop Search

Implementing ISearchProtocol

by matt 8. November 2006 23:02

Now that we've got the protocol handler registered, and we've told WDS the default url to start searching against, we need to implement ISearchProtocol. Fortunately, that's very easy:

STDMETHODIMP Init(TIMEOUT_INFO *pTimeoutInfo, IProtocolHandlerSite *pProtocolHandlerSite,
	PROXY_INFO *pProxyInfo)
{
	ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::Init\n"));

	// We don't have anything to initialise, or to cleanup from an unexpected termination
	return S_OK;
}

STDMETODIMP CreateAccessor(...)
{
	...
}

STDMETHODIMP CloseAccessor(IUrlAccessor *pAccessor)
{
	ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::CloseAccessor\n"));

	// We don't want to do anything with this, just let the host close it normally
	// (it calls Release). (This method gives us the chance to pool accessors)
	return S_OK;
}

STDMETHODIMP ShutDown()
{
	ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::ShutDown\n"));

	// We don't have anything to clean up
	return S_OK;
}

Currently, we have nothing to setup in the initialise phase, and similarly nothing for ShutDown to do. One important thing to note is that the ShutDown method might not get called. If the host process crashes (and there is a good chance it will - there are a lot of 3rd party plugins running around here) ShutDown won't get called, so if you create anything persistent in Init, you need to clean it up on the next Init.

On the face of things, CloseAccessor is an odd method. Surely the host would just call Release on the IUrlAccessor to close it down? And indeed it will. This method allows the host to notify you that it's about to throw away the UrlAccessor. It's an opportunity for you to e.g. pool the accessor object (by AddRef-ing it and chucking it on a list). I'm not interested in this, so I just do nothing.

The biggy is of course CreateAccessor. Here's the code I missed out above:

STDMETHODIMP CreateAccessor(LPCWSTR pcwszURL, AUTHENTICATION_INFO *pAuthenticationInfo, 
	INCREMENTAL_ACCESS_INFO *pIncrementalAccessInfo, 
	ITEM_INFO *pItemInfo, IUrlAccessor **ppAccessor)
{
	ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::CreateAccessor\n"));

	CComObject<CUrlAccessor> *pUrlAccessor;
	HRESULT hr=CComObject<CUrlAccessor>::CreateInstance(&pUrlAccessor);
	if(SUCCEEDED(hr))
		hr=pUrlAccessor->QueryInterface(ppAccessor);
	return hr;
}

It's a bit naive at the moment, but it starts to show the relationship between ISearchProtocol and IUrlAccessor. In comes a url, and out goes an IUrlAccessor object. Whenever you want to get access to a url, you ask the ISearchProtocol. It's kind of analogous to the win32 CreateFile function - if you want access to a file, call CreateFile, get back a handle to the file. Abstract that away from the file system, and you've got ISearchProtocol->CreateAccessor and IUrlAccessor.

This method takes a couple of interesting parameters. It's got the url and some authentication info that is really only of use if you're doing http stuff. It also includes an ITEM_INFO structure that doesn't look too useful. The important one is the INCREMENTAL_ACCESS_INFO structure. This simply contains a size and a FILETIME of the last time the search gatherer knows it was modified. If it's never been searched, this will be zero. If the item hasn't been modified since this timestamp, CreateAccessor can return PRTH_S_NOT_MODIFIED and the item will not be re-indexed. This allows WDS to use an incremental update scheme - all urls reported by your ISearchProtocol will be accessed to see if they've been updated. It's a bit brute force - think trawling your entire hard disk to see if any files have changed, so WDS also supports change notifications, which I'll get to later in the development, once I've figured out how they work. I also need to find out if the modification time is set automatically by the indexer when the url is filtered, or if I have to do somewhere in my code.

I fully expect the method shown above to get more complex - there's no support for incremental updates, for one thing, but I'm probably also going to have to create a different number of IUrlAccessor objects - one for the root, one for folders and one for feeds. All in good time...

Next up, a bit of a closer look at IUrlAccessor - what data can you get out of it, and how do you use it for indexing?

No code today - I want to look at IUrlAccessor first.

Tags:

Windows Desktop Search

Still registering the protocol handler

by matt 7. November 2006 12:39

I've got my vpc image set up with XP, VC++, WDS2.6.5 and the various SDKs. I've got the code from last time, and I can register a COM object via a ProgId to handle the "msfeed:" protocol. But that's only half the story. I need to tell WDS what urls to index - at least where to start.

This is easier said than done. To quote the original MSN Desktop Search documentation:

After the protocol handler is registered, use the AddDefaultUrl method in the ISearchCrawlScopeManager interface to set default crawling rules to include and exclude particular URLs and child URLs

It doesn't mention where ISearchCrawlScopeManager comes from or what object implements it - do I implement it, or does WDS? Can I get to it from another object, or do I have to CoCreateInstance it?

Questions to keep you awake at night. I can put you out of your misery - ISearchCrawlScopeManager gets passed into ISearchProtocolOptions->GetDefaultCrawlScope. Clear as mud, eh?

Given this brick wall, there's only one thing to do - implement something. I'm going to create a COM object that implements ISearchProtocol, as this is the main interface that WDS is expecting. This will be the Sticklebackplastic.MsFeedProtocolHandler.1 object we said would handle "msfeed:" so make sure to ignore these instructions and keep the ProgId! I've given this object a threading model of "Both" for the moment. Easy option. I've also created my own trace category, and used the ATLTRACENOTIMPL macro in each method. That, coupled with ATL_DEBUG_QI should report any activity on that object, including unimplemented interfaces. (I could paste the code here, but it's long and not currently very interesting and this post is going to be long enough without it, so just download it all at the end...)

You're going to need DebugView now - atl will write debug info when one of my unimplemented methods is called, or when an unimplemented interface is requested.

But to get that info, I need the code running, and the only way I can get the code running right now is to register the dll.

What I really want WDS to do is get my request to register the protocol handler, fire up the object via it's ProgId, and ask for an interface, or call an unimplemented method so that I can see what to implement next.

The only problem is a bug in 2.6.5. WDS just sits there, oblivious.

You have to give it a bit of a kick, either by exiting WDS and restarting, or by selecting "Index Now" (which you might have to do a couple of times before it actually does the trick). Now, the protocol handler will get created by WDS, and it will ask for ISearchProtocol. It also asks for ISearchProtocolOptions, which hasn't yet been implemented. (I also get asked for ISearchProtocolThreadContext, which is defined in the Vista Windows SDK. I'm going to ignore that for now.)

Now I can implement my GetDefaultCrawlScope, and call AddDefaultUrl with a default url of "msfeed://localhost/". Going to the search options dialog shows this url with a checkbox as one of the things being searched. Looks like we're off to a good start.

You can add more than one url at a time. I could add "msfeed://localhost/" and "msfeed://localhost/matt/". Only the first would show up in the options dialog (perhaps because matt is a directory under localhost?) Add them in the opposite order, and both show up. Go figure.

Another somewhat annoying thing is that if I unregister the protocol handler (using ISearchManager->RemoveProtocol) it doesn't remove it from the options dialog.

There's more to ISearchProtocolOptions than just GetDefaultCrawlScope, but nothing else is being asked for yet, so I'm going to leave it at that.

Next on the agenda is implementing ISearchProtocol.

You can download the current code here.

Tags:

Windows Desktop Search

Vista and the Desktop Search protocol handler

by matt 1. November 2006 22:03

Of course, installing Vista does make developing a WDS protocol handler just a tad more awkward. You see, as I hinted at before, I'd rather build on WDS 2.6.x and Vista comes with 3.0.

The major difference between the two is that 2.x runs in the context of the logged on user, and 3.0 runs as a service. And the msfeed APIs get you the feeds for the currently logged on user. I'd rather solve the Vista-runs-as-a-service problem separately to solving the write-a-working-protocol-handler problem.

Good old virtualisation to the rescue. I'll have a Windows XP image up and running in no time.

Tags:

Windows Desktop Search | Vista

Registering your protocol handler. Actual code.

by matt 18. October 2006 20:32

Enough procrastination. I've set the lofty goal. I've worried about documentation. I've fretted about SDKs. I've even lectured on how I want my project set up. I haven't written a single line of code. For shame.

To be fair, this is going to be a little dull, so let's rush through the boring bits. Following my own advice, I've created an ATL project called msfeedph and deleted all the unnecessary bits. I've also added the include directories of the WDS SDK, the msfeeds API mini-SDK + the SharePoint Portal 2001 SDK to the project. I've then included wdsSetup.h in the stdafx.h.

Now we're ready to register our protocol handler. The docs say that we need to call ISearchManager::AddProtocol. Looking at wdsSetup.idl, we can see that the SearchManager object implements ISearchManager, so let's create one of those, and call AddProtocol. We need to pass in a protocol name, and the ProgId of an object that will handle that protocol. We'll use "msfeed" as the protocol name, and "Sticklebackplastic.MsFeedProtocolHandler.1" as the ProgId. We'll also do a reciprocal RemoveProtocol. Here's the code:

STDAPI DllRegisterServer(void)
{
    HRESULT hr = _AtlModule.DllRegisterServer(FALSE);
    CComPtr<ISearchManager> spSearchManager;
    hr = spSearchManager.CoCreateInstance(CLSID_SearchManager);
    if(SUCCEEDED(hr))
    {
        hr = spSearchManager->AddProtocol(L"msfeed", L"Sticklebackplastic.MsFeedProtocolHandler.1");
    }

    return hr;
}

STDAPI DllUnregisterServer(void)
{
    HRESULT hr = _AtlModule.DllUnregisterServer(FALSE);
    CComPtr<ISearchManager> spSearchManager;
    hr = spSearchManager.CoCreateInstance(CLSID_SearchManager);
    if(SUCCEEDED(hr))
    {
        hr = spSearchManager->RemoveProtocol(L"msfeed");
    }

    return hr;
}

Excellent. Just one small problem - it doesn't compile. CLSID_SearchManager is an undefined symbol. We need to add a new file, that I've called DefineGuids.cpp:

#include "stdafx.h"
#define INITGUID
#include <wdsSetup_i.c>

This just includes the C part of the file generated from wdsSetup.idl, which fortunately for us, declares the GUIDs from the idl file.

Now we compile and register and all is well. We don't yet have a "Sticklebackplastic.MsFeedProtocol.Handler.1" class so nothing appears in WDS yet, but it's a start. We can check the registry (HKCU\Software\Microsoft\RSSearch\ProtocolHandlers) and see that it's there.

The next step is to set the default urls that WDS will crawl.

You can download the current code here.

Tags:

Windows Desktop Search

SDKs for search protocol handlers

by matt 5. October 2006 18:36

Sheesh. After finally detailing how I setup a blank ATL COM project, I thought I might as well actually begin to develop something. Of course I'd forgotten something. We've already seen what a mess the documentation is in, but what about SDKs?

Firstly, we're going to need the Windows Desktop Search API SDK. This is a nice little zip file that includes a couple of idl files. There's a wdsQuery.idl that gives a coclass and an interface for issuing queries, a wdsView.idl that appears to allow you to host the WDS results viewer (with previewer?). The one we're interested in, though, is wdsSetup.idl, that gives a few interfaces to aid in registering a protocol handler. It even includes the CLSID of the object that implements ISearchManager that I was worrying about last time.

But it doesn't include the ISearchProtocol interface itself.

Looking at the Other Resources section of the MSN Search Toolbar guide (isn't it called Windows Desktop Search these days, guys?) there's a link to some documentation on protocol handlers (as I pointed out last time), and there's a link to a knowledge base article that includes a link to download the 2001 SharePoint Portal SDK, which includes the definition of ISearchProtocol. We're going to use this.

It's probably worth getting the latest Platform SDK, too. You could go cutting edge and get the latest beta of the Windows SDK - but bear in mind that it includes the beta bits of the .net 3 framework...

Of course, since the whole point of this protocol handler is to search IE7's RSS feed store, you'll need headers for that too. They are included in the Windows SDK, or you can get them on the cheap from the RSS team blog.

Right. I think we're set. But I've said that before.

Tags:

Windows Desktop Search

The prototypical ATL COM project

by matt 2. October 2006 21:14

Now there's a title to make your pulse race.

This post is ostensibly another in the how-to-write-a-Windows-Desktop-Search-protocol-handler series, but it's really just what-Matt-does-to-write-a-COM-object. I've done it for just about every COM object I've written; I have a feeling I'll be referring back to it.

And yes, that title does say ATL. It's C++ time.

I'm not going into the C++/.net COM object argument just now. I will at some point - I've got plenty to say about it - but for this, I'm using C++. Not everything is a .net shaped nail.

Of course, these tips make certain assumptions. I'm talking about COM objects that are in-process and that only implement other people's interfaces (think shell extensions, addins for Office, Visual Studio or Windows Desktop Search. Pretty much any kind of addins). Some of this won't work for situations such as scriptable components, custom objects for a custom application and so on. Your mileage may vary.

Right then. Fire up Visual Studio and create a new ATL project. We want a dll and we don't want it attributed. I'm a bit of a Luddite with regards to C++ attributes. I just know it's a code generation thing, and I'm old-skool enough to write this stuff long-hand, thanks. (That said, I'm reading the link above, and it does look rather interesting. Might have to have a play with this...)

I'm going to assume the project was called "dllname", and I'm going to leave it as an exercise to the reader to substitute "dllname" for the right name for the rest of this post.

1. Remove proxy stub project

I haven't once used this. I'm not implementing my own interfaces, so I don't need to provide any custom marshalling. Right click on the PS project and delete it.

2. Get rid of the typelib

Again, I'm not implementing my own interfaces, so I don't need to describe them to anyone. If I don't remove the typelib, ATL will register an essentially empty typelib in the registry for me, and will store the typelib as a resource in my dll. Both of these are unnecessary bloat. Trivial, yes, but easily prevented (we should still try to be as streamline as possible, even in this day of cheap memory and storage).

Firstly, open up the dllname.cpp file, and modify DllRegisterServer. Simply add a FALSE to the arguments, so that the typelib isn't registered:

HRESULT hr = _AtlModule.DllRegisterServer(FALSE);

Do a similar thing for DllUnregisterServer:

HRESULT hr = _AtlModule.DllUnregisterServer(FALSE);

Now we need to remove the typelib from the resources. Open dllname.rc, and in the resource view, right click on the dllname.rc node (not the dllname project node!). Select "Resource Includes". You're going to get a dialog with two big edit boxes. In the bottom one will be the line:

1 TYPELIB "dllname.tlb"

Delete this.

Don't delete the .idl file from the project! This does create the type library (.tlb) but the ATL wizards need this file.

3. Don't register an AppId

Again, never needed this. Open the dllname.rgs file and remove the AppId section. Easy peasy.

4. Enable QueryInterface debugging

This one's a corker. Saved the best til last. Once you've done this, every time a client calls one of your objects' QueryInterface methods, you'll get a string written via OutputDebugString with the class name, the name of the interface requested (from the registry) and a "failed" marker if the interface isn't supported.

This is an absolutely essential debugging aid when working with COM. Quite often the documentation will tell you what all the interfaces are, what each one does, but not which combination of interfaces to implement on a given object. Or, it only lists the interfaces for that particular API domain, and don't list the standard COM interfaces required, such as IPersistFile, etc.

Edit you stdafx.h and define _ATL_DEBUG_QI before including any atl header files.

Note that there's a bug in AtlDumpIID (the function that does all the magic) in atlbase.h. If the interface name isn't found in the registry, the code is supposed to output the raw IID, but the logic is wrong, and nothing gets output. It's fixed in Visual Studio 2005 and later, so if you have anything earlier, check out Craig Skibo's post on the problem. He offers a solution that you just have to copy into the file. It's a little verbose (I'd have just set a flag on success and checked that at the end of the file rather than have all those goto's...)

A top tip in conjunction with this is to use sysinternal's DebugView to capture all those messages to OutputDebugString, even when you're not in the debugger. Oh, and if you don't know what a raw IID actually refers to, don't forget Google. Or install the Platform SDK and Windows Desktop Search.

5. Cleanup object rgs scripts

Now, this one is for when you create a new object, rather than the first steps of a project like the rest. So, go on. Create an object. Now open it's .rgs file. The first thing we can get rid of is the typelib reference. We're not registering that typelib any more, so we can get rid of that. And I always get rid of "Programmable", and if you're not using ProgIds, get rid of those too.

And that's it. Just a couple of simple changes that I make every time I start a COM object.

PS. Whatever development you're doing (C++/.net/asp.net/whatever) you really, really want to use Microsoft's symbol server.

Tags:

Windows Desktop Search

Documentation for search protocol handlers

by matt 31. August 2006 19:14

Well, I've found some more documentation. And it's got me a bit nervous.

The IFilter stuff is fine. I don't think that's changed at all since it was first created for NT4.

The problem is with the protocol handlers.

The MSN Toolbar docs point to this documentation for Sharepoint Portal Server SDK 2001. It, of course, talks about everything from a Sharepoint Portal Server 2001 point of view. And I of course have absolutely no idea what's going on. It's written from the point of view that you already know what each part of the system is, what they do, and how they talk. It's also 5 years old. Still, it links to the reference info for the required interfaces, so I guess it's a good start. There's even a sample, which seems to be the only place you can get the header files for ISearchProtocol.

Getting a little more modern, there's a slightly expanded version of the same docs for SharePoint Portal Server 2003 - only 3 years out of date. It has a nice little troubleshooting section, and provides a bit more detail on a couple of the interfaces to implement. There's also a bit of a reference section. But we're still talking SharePoint, not WDS.

Then there's the MSN Toolbar docs themselves. Again, it's all a bit everything-you-need-without-any-detail. But it does detail how you register a protocol handler (without telling you how to get the ISearchManger interface). The big problem is that it's for Windows Desktop Search when it was still called the MSN Desktop Search. I'm presuming the registration still works...

Things get really interesting when you go to the Windows Shell MSDN library page. Looking in the tree view on the left you'll see nodes for WDS 2.x and WDS 3.0. Only the 2.x version has a landing page. It's described as preliminary documentation, and only talks about IFilter. The interfaces listed in the reference section aren't terribly useful, either.

The WDS 3.0 section only has reference information, and lists 39 interfaces, a couple of which are also mentioned in the Sharepoint docs. Unfortunately, it's ISearchManager interface has completely different methods to the MSN Toolbar interface of the same name, and offers no way of registering protocols handlers. It looks like WDS has had quite an overhaul for version 3/Vista. This post on MSDN's forums seems to confirm that there's quite a difference.

And that's what's making me nervous. I've currently got WDS 3.0 Beta 2 running, but I think I'd be happier developing towards the 2.x version - because it looks more like the SharePoint stuff that I've got a sample for. Plus, I don't have the headers for the Vista stuff - I don't fancy downloading a beta version of the Platform SDK, especially since it contains the beta WinFx stuff.

(And it looks like I've got an out-of-date PSDK as it is. Here's the link to the Windows Server 2003 R2 Platform SDK - March 2006 Edition.)

So, first plan of action is to uninstall 3.0 and reinstall the latest 2.x version. Then perhaps we can get started with some actual dev work!

Tags:

Windows Desktop Search

Extending Windows Desktop Search

by matt 30. August 2006 22:33

One thing Microsoft gets is Platforms. Just about everything they do is a platform. Not for them the quick app. Oh no. It has to be extensible.

I'm writing this in Windows Live Writer. It supports plugins. Each Office application supports addins and both internal and external scripting. Explorer is built entirely out of extensions, as is Visual Studio.

Windows Desktop Search is one such platform. Not only is it a rather smashing search engine (and the evolution of the indexing service found way back in Windows NT 4) but it's also incredibly extensible.

There are three and a half ways of extending WDS. Firstly, you can create a COM object that implements the IFilter interface. This is how WDS can understand and index different file formats. And, if the item to be indexed isn't a file on a file system, you can create an object that implements ISearchProtocol and it's related interfaces. And finally, you can implement a preview for your file type.

(The half that's left over is a really interesting one, and something I intend to write up at some point - WDS will use any property handlers you have associated with a file type. This is something that is seriously under-documented (look for the line marked metadata handler), and looks like it's changed drastically under Vista.)

Now, I've got a plan. For a while now, I've wanted to change my feed reader from Sharpreader to, well, just about anything else to be honest (but that's for another post). IE7 came along with the Windows Feed Platform (another platform - see what I mean?) and so I've been meaning to migrate my feeds over here. This will happen in due course, but this and WDS together got me thinking - and not just me, either. Brandon LeBlanc asked the question I thought of, as did some unnamed wiki editor.

I'm going to try and write a search protocol for IE7's RSS feed platform, and just to make it more fun, I intend to blog each step in the process. We'll start at the very beginning - getting the project setup and figuring out where the documentation is.

Tags:

Windows Desktop Search

About the author

Something about the author

Calendar

<<September 2010>>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

View posts in large calendar

RecentComments

Comment RSS

License

Creative Commons License
Except where otherwise noted, content on this site is by Matt Ellis and is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

©2010 Matt Ellis