Re: Limiting a search by URL from Alan Kent on 2003-01-16 (www-zig@w3.org from January 2003)

From: Alan Kent <ajk@mds.rmit.edu.au>
Date: Thu, 16 Jan 2003 12:48:55 +1100
To: www-zig@w3.org
Message-ID: <20030116124855.A28839@io.mds.rmit.edu.au>

I might be missing something, but was there an implicit desire to do more
than just substring or word matching? Examples I saw included (when searching
for a.b):

	a.b
	a.b/url.stuff
	http://a.b/url.stuff
	x.y.a.b

Is the goal to make sure only the host name part of a URL is searched
for? That is, so the following URLS do not match?

	/a.b
	http://this.that/a.b.c
    
etc. What about should x.y match http://w.x.y.z/?

If its enough just to do words in a URL, then I would express the query
the same way as words in normal text. Left+right truncation would give
less accurate answers: aaa.bbb would match a.b for example. But if
you want extra context (eg: must be the trailing bit of a domain name
part of a URL), then something more is needed. For example, a USE attribute
that is bound only to the domain name of a URL. (Its the system's
responsibilty to find the domain name, not the profile definition.)
For example the following URLs would have the following USE attribute
values computed

	a.b			a.b
	a.b/url.stuff		a.b
	http://a.b/url.stuff	a.b
	x.y.a.b			x.y.a.b

But still its not possible to do a right anchored word based search.
(That is a 'last-words-in-field' operator.) Character level truncation
I think is a mistake as a.b will match aa.b.


In summary:

* The best way to do it I think is to have a USE attribute bound
  only to the domain name, plus a new 'last-words-in-field' operator.
  But no-one supports this today I think.

* The practical way to do it I think is to use adjacency with words
  (which is what everyone else is saying I believe). It will give
  false matches sometimes, but will work today.

Alan

Received on Wednesday, 15 January 2003 20:49:29 UTC