Implementing IUrlAccessor

by Matt 27. November 2006 10:35

So, if you tilt your head, squint a little bit and use just a dash of imagination, you can say that ISearchProtocol::CreateAccessor is kind of analagous to CreateFile - it abstracts away the access to what the url refers to just as CreateFile abstracts away the accessing of the file system and the hard disk.

But while CreateFile gives you a handle you can pass in to other API functions, CreateAccessor returns back an instance of the IUrlAccessor interface.

And this is where the whole CreateFile analogy breaks down somewhat. Didn't last long, really, did it? IUrlAccessor is not intended to be an equivalent file system API for a url protocol. Instead, it's about getting access to the any of the url's data that's required for indexing - "file" metadata (size, last modified date), security data and actual "file" contents. (When I'm saying "file" it's really just shorthand for "resource referred to by the url passed to CreateAccessor".)

This simple metadata is available directly from the interface (GetSize, GetLastModified, GetSecurityDescriptor), but getting at the contents is a bit more work.

Obviously, the indexer cannot know about the format of all "files" it's asked to index. Especially when you consider that some files contain just content (such as plain text), some contain only metadata (such as mp3 files) and some contain both (e.g. Word files). We need another layer of abstraction. And that's where IFilter comes in.

The IFilter interface is called by the indexer to retrieve content and metadata from the underlying data source (file/url). The primary purpose of IUrlAccessor is to retrieve an IFilter for the resource represented by the url. So let's take a closer look at IUrlAccessor:

interface IUrlAccessor: IUnknown
{
    HRESULT AddRequestParameter([in] PROPSPEC *pSpec,
                                [in] PROPVARIANT *pVar);
    HRESULT GetDocFormat([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszDocFormat[],
                         [in] DWORD dwSize,
                         [out] DWORD *pdwLength);
    HRESULT GetCLSID([out] CLSID *pClsid);
    HRESULT GetHost([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszHost[],
                    [in] DWORD dwSize,
                    [out] DWORD *pdwLength);
    HRESULT IsDirectory();
    HRESULT GetSize([out] ULONGLONG *pllSize);
    HRESULT GetLastModified([out] FILETIME *pftLastModified);
    HRESULT GetFileName([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszFileName[],
                        [in] DWORD dwSize,
                        [out] DWORD *pdwLength);
    HRESULT GetSecurityDescriptor([out, size_is(dwSize)] BYTE *pSD,
                                  [in] DWORD dwSize,
                                  [out] DWORD *pdwLength);
    HRESULT GetRedirectedURL([out, length_is(*pdwLength), size_is(dwSize)] WCHAR wszRedirectedURL[],
                             [in] DWORD dwSize,
                             [out] DWORD *pdwLength);
    HRESULT GetSecurityProvider([out] CLSID *pSPClsid);
    HRESULT BindToStream([out] IStream **ppStream);
    HRESULT BindToFilter([out] IFilter **ppFilter);
};

It's a bit of an odd interface, really - you're actually not expected to implement all of it. Methods that don't make sense for your implemented should return E_NOTIMPL.

There are a number of methods that aren't used - AddRequestParameter, GetHost and GetSecurityProvider. The simple metadata methods are pretty much self explanatory - GetSize, GetLastModified and GetSecurityDescriptor (although this last one will need investigating a bit more closely). The rest are all about getting an IFilter.

When the indexer is indexing the file system, the IFilter is selected based on file extension. When indexing via IUrlAccessor, there are more interesting things to take into account, and IUrlAccessor allows you to customise this simple file extension mapping. Remember that if you don't need this flexibility, you can just return E_NOTIMPL. Also note that the docs don't give an order in which these methods are called - I've listed them here in fairly random order:

  • GetCLSID allows you to return back a class Id that can handle this file type (such as Microsoft Word). I'm guessing this is to do with ActiveDocuments? The main purpose for this is to be able to have a url such as (http://example.org/wordfile.file) actually be a Word file without having to have a .doc extension.
  • GetDocFormat allows you to specify a MIME type that takes precedence over the url's extension.
  • If your url scheme just happens to map UNC accessable files to urls, you can just return the file name here, and it'll get indexed the same as file system files.
  • BindToStream allows you to provide a stream over your data. The indexer can then read the file contents from the stream and either save them to a temporary file and bind an IFilter to that file, or bind the IFilter directly to the stream.
  • If none of those methods suit and you want to take complete control of hooking up the IFilter or if the data represented by your url isn't a normal desktop file format (such as a row in a database, or, as in our case, an RSS item), you can return your own IFilter implementation from BindToFilter.

The final two methods alter how the indexing occurs - GetRedirectedURL and IsDirectory.

GetRedirectedURL allows you to return the actual url that should be used while indexing. In other words, if you have a document at a url that gets redirected, this allows you to tell the indexer that a) it's been redirected, and b) any relative links that your IFilter emits are to be resolved against the redirected url. I don't know if this causes the previously stored url to be updated.

IsDirectory tells the indexer that the current url represents a directory. Surprising that. This means the indexer will treat any emitted child urls as being in this folder. Handy for using the "in:" and "under:" search syntax. (Think of searching in Outlook - "in:Inbox", "in:myfolder", "under:trash").

So that's IUrlAccessor. Doesn't look too tricky. I think the next thing to look at will be returning links from IFilter - this is how we're going to crawl the whole store. Then it'll have to be how to represent the RSS feed store as urls. Hopefully then I'll be able to get at some code - although threading might rear it's ugly head...

Tags:

Windows Desktop Search

Comments (35) -

Sanin Saracevic
7/6/2007 9:22:53 AM #

RE: Implementing IUrlAccessor

Have you tried implementing BindToStream? I am having a hard time getting it to work and index the contents of the stream...

Reply

best suv
best suv
7/20/2011 10:24:07 PM #

hi!,I like your writing so so much! proportion we communicate extra about your post on AOL? I require a specialist in this area to unravel my problem. May be that's you! Taking a look forward to see you.

Reply

best hybrid cars
best hybrid cars
7/22/2011 3:45:34 AM #

Fantastic task I like your type! Would really like to right here your feedback on my website! I am also seeking for someone to help you me make websites!

Reply

iphone app development
iphone app development
7/24/2011 3:32:33 AM #

Una pagina sarà dedicata agli accessori, una alle giacche e ai giubbotti. Troverai le indicazioni per lo spaccio o negozio Moncler più vicino a casa tua e tutte le offerte più vantaggiose di questo prestigioso marchio.

Reply

photographe mariage paris
photographe mariage paris United States
6/8/2015 1:57:51 AM #

Thanks for your write-up on this website. From my experience, occasionally softening upwards a photograph could possibly provide the wedding photographer with a little an artsy flare. Many times however, that soft blur isn't just what exactly you had planned and can quite often spoil an otherwise good picture, especially if you anticipate enlarging this.

Reply

philix
philix United States
11/27/2015 3:41:14 PM #

Ace Web Site.

Reply

Hannah Barbarino
Hannah Barbarino United States
1/15/2016 3:48:11 AM #

Printing out an essay or paper, reading the first line and realizing there aren't any mistakes.<br />AWESOME!

Reply

Walter Rigsbee
Walter Rigsbee United States
1/15/2016 4:04:18 AM #

I love the dolphin saving you one.

Reply

Bea Sawin
Bea Sawin United States
1/15/2016 4:15:28 AM #

Thaaaat IS awesome! lol hehe

Reply

Rodolfo Thomases
Rodolfo Thomases United States
1/15/2016 4:51:35 AM #

Thaaaat IS awesome! lol hehe

Reply

Willis Karam
Willis Karam United States
1/15/2016 4:56:48 AM #

Printing out an essay or paper, reading the first line and realizing there aren't any mistakes.<br />AWESOME!

Reply

Jeannine Mischel
Jeannine Mischel United States
1/15/2016 5:33:12 AM #

AWESOME!Can't wait for that, will be something to want to accomplish day by day, to do something AWESOME!

Reply

Janean Finkelson
Janean Finkelson United States
1/15/2016 5:54:09 AM #

Congrats! So glad to see this site getting the recognition it deserves, the world can always use a little more awesome.

Reply

Vonnie Thoen
Vonnie Thoen United States
1/15/2016 6:06:30 AM #

Well good for you! I have a lot but one I just thought of was when you lick a crease of paper so that it'll rip in a straight line and it actually tears in a frayed straight line rather than rips awkwardly. I love that bleachy taste and that bleachy satisfaction.

Reply

Deena Hofer
Deena Hofer United States
1/15/2016 6:24:09 AM #

Printing out an essay or paper, reading the first line and realizing there aren't any mistakes.<br />AWESOME!

Reply

Riley Laurino
Riley Laurino United States
1/15/2016 6:44:27 AM #

getting laid AWSOME

Reply

Jared Teteak
Jared Teteak United States
1/15/2016 6:47:58 AM #

getting laid AWSOME

Reply

Irvin Dewinter
Irvin Dewinter United States
1/15/2016 7:06:52 AM #

I love the dolphin saving you one.

Reply

Cornelius Styborski
Cornelius Styborski United States
1/15/2016 7:14:37 AM #

I love the dolphin saving you one.

Reply

Myron Roeske
Myron Roeske United States
1/15/2016 7:19:42 AM #

Congratulations, Neil! I love this site and I know I'll love the book. Christmas, birthdays, Father's Day 2010  check!

Reply

Mitzi Towers
Mitzi Towers United States
1/15/2016 7:44:54 AM #

Hey, congratulations Neil. This site really is something special and it's been a pleasure to share it with my readership too. Looking forward to future awesome things, and I'll send any suggestions that are sufficiently awesome.

Reply

Rodolfo Thomases
Rodolfo Thomases United States
1/15/2016 8:59:42 AM #

AWESOME!Can't wait for that, will be something to want to accomplish day by day, to do something AWESOME!

Reply

Anna Farris
Anna Farris United States
7/20/2016 7:43:04 AM #

You have an accurate point of view, thanks!

Reply

Mertie Viele
Mertie Viele United States
8/25/2016 4:02:29 AM #

Greetings! This is my first visit to your post! We are a group of volunteers and working on a new initiative in a community in the same niche. Your blog provided us valuable information to work on. You have done a nice job!

Reply

how to patent an idea
how to patent an idea United States
5/31/2017 6:38:28 AM #

I’m impressed, I have to admit. Seldom do I come across a blog that’s both equally educative and interesting, and let me tell you, you've hit the nail on the head. The issue is an issue that not enough people are speaking intelligently about. Now i'm very happy that I found this during my hunt for something relating to this.

Reply

tratamiento para el acne
tratamiento para el acne United States
10/6/2017 12:47:41 AM #

Reply

como eliminar las espinillas
como eliminar las espinillas United States
10/6/2017 8:09:02 PM #

you're actually a excellent webmaster. The site loading pace is incredible. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you've performed a excellent process on this matter!

Reply

Issac Maez
Issac Maez United States
12/6/2017 5:55:49 AM #

If you're still on the fence: grab your favorite earphones, head down to a Best Buy and ask to plug them into a Zune then an iPod and see which one sounds better to you, and which interface makes you smile more. Then you'll know which is right for you.

Reply

Hipolito M. Wiseman
Hipolito M. Wiseman United States
12/6/2017 8:35:31 AM #

Zune and iPod: Most people compare the Zune to the Touch, but after seeing how slim and surprisingly small and light it is, I consider it to be a rather unique hybrid that combines qualities of both the Touch and the Nano. It's very colorful and lovely OLED screen is slightly smaller than the touch screen, but the player itself feels quite a bit smaller and lighter. It weighs about 2/3 as much, and is noticeably smaller in width and height, while being just a hair thicker.

Reply

바카라
바카라 United States
7/7/2018 2:36:43 AM #

Don’t wear seat belts lest you drown in your own urine?

Reply

Duncan Duldulao
Duncan Duldulao United States
7/15/2018 8:18:43 AM #

Black on black in the Charg I'm creepin' Rub me the right way, you might get a genie B.o.B, black Houdini

Reply

Annalee Como
Annalee Como United States
7/18/2018 6:28:11 PM #

sound like you know what you?re talking about! Thanks

Reply

Glynis Mione
Glynis Mione United States
7/20/2018 1:50:47 AM #

Don’t wear seat belts lest you drown in you own urine?

Reply

William Scale
William Scale United States
7/20/2018 11:20:19 PM #

Don’t wear seat belts lest you drown in you own urine?

Reply

Yvonne Urness
Yvonne Urness United States
7/21/2018 5:12:08 PM #

It?s hard to come by knowledgeable people on this subject, however, you

Reply

Add comment

biuquote
  • Comment
  • Preview
Loading

Rel=Me

Month List

RecentComments

Comment RSS