Now that we've got the protocol handler registered, and we've told WDS the default url to start searching against, we need to implement ISearchProtocol. Fortunately, that's very easy:
STDMETHODIMP Init(TIMEOUT_INFO *pTimeoutInfo, IProtocolHandlerSite *pProtocolHandlerSite,
PROXY_INFO *pProxyInfo)
{
ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::Init\n"));
// We don't have anything to initialise, or to cleanup from an unexpected termination
return S_OK;
}
STDMETODIMP CreateAccessor(...)
{
...
}
STDMETHODIMP CloseAccessor(IUrlAccessor *pAccessor)
{
ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::CloseAccessor\n"));
// We don't want to do anything with this, just let the host close it normally
// (it calls Release). (This method gives us the chance to pool accessors)
return S_OK;
}
STDMETHODIMP ShutDown()
{
ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::ShutDown\n"));
// We don't have anything to clean up
return S_OK;
}
Currently, we have nothing to setup in the initialise phase, and similarly nothing for ShutDown to do. One important thing to note is that the ShutDown method might not get called. If the host process crashes (and there is a good chance it will - there are a lot of 3rd party plugins running around here) ShutDown won't get called, so if you create anything persistent in Init, you need to clean it up on the next Init.
On the face of things, CloseAccessor is an odd method. Surely the host would just call Release on the IUrlAccessor to close it down? And indeed it will. This method allows the host to notify you that it's about to throw away the UrlAccessor. It's an opportunity for you to e.g. pool the accessor object (by AddRef-ing it and chucking it on a list). I'm not interested in this, so I just do nothing.
The biggy is of course CreateAccessor. Here's the code I missed out above:
STDMETHODIMP CreateAccessor(LPCWSTR pcwszURL, AUTHENTICATION_INFO *pAuthenticationInfo,
INCREMENTAL_ACCESS_INFO *pIncrementalAccessInfo,
ITEM_INFO *pItemInfo, IUrlAccessor **ppAccessor)
{
ATLTRACE(myTraceProtocolHandler,0,_T("ISearchProtocol::CreateAccessor\n"));
CComObject<CUrlAccessor> *pUrlAccessor;
HRESULT hr=CComObject<CUrlAccessor>::CreateInstance(&pUrlAccessor);
if(SUCCEEDED(hr))
hr=pUrlAccessor->QueryInterface(ppAccessor);
return hr;
}
It's a bit naive at the moment, but it starts to show the relationship between ISearchProtocol and IUrlAccessor. In comes a url, and out goes an IUrlAccessor object. Whenever you want to get access to a url, you ask the ISearchProtocol. It's kind of analogous to the win32 CreateFile function - if you want access to a file, call CreateFile, get back a handle to the file. Abstract that away from the file system, and you've got ISearchProtocol->CreateAccessor and IUrlAccessor.
This method takes a couple of interesting parameters. It's got the url and some authentication info that is really only of use if you're doing http stuff. It also includes an ITEM_INFO structure that doesn't look too useful. The important one is the INCREMENTAL_ACCESS_INFO structure. This simply contains a size and a FILETIME of the last time the search gatherer knows it was modified. If it's never been searched, this will be zero. If the item hasn't been modified since this timestamp, CreateAccessor can return PRTH_S_NOT_MODIFIED and the item will not be re-indexed. This allows WDS to use an incremental update scheme - all urls reported by your ISearchProtocol will be accessed to see if they've been updated. It's a bit brute force - think trawling your entire hard disk to see if any files have changed, so WDS also supports change notifications, which I'll get to later in the development, once I've figured out how they work. I also need to find out if the modification time is set automatically by the indexer when the url is filtered, or if I have to do somewhere in my code.
I fully expect the method shown above to get more complex - there's no support for incremental updates, for one thing, but I'm probably also going to have to create a different number of IUrlAccessor objects - one for the root, one for folders and one for feeds. All in good time...
Next up, a bit of a closer look at IUrlAccessor - what data can you get out of it, and how do you use it for indexing?
No code today - I want to look at IUrlAccessor first.