- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Thu, 16 Jan 2003 12:48:55 +1100
- To: www-zig@w3.org
I might be missing something, but was there an implicit desire to do more than just substring or word matching? Examples I saw included (when searching for a.b): a.b a.b/url.stuff http://a.b/url.stuff x.y.a.b Is the goal to make sure only the host name part of a URL is searched for? That is, so the following URLS do not match? /a.b http://this.that/a.b.c etc. What about should x.y match http://w.x.y.z/? If its enough just to do words in a URL, then I would express the query the same way as words in normal text. Left+right truncation would give less accurate answers: aaa.bbb would match a.b for example. But if you want extra context (eg: must be the trailing bit of a domain name part of a URL), then something more is needed. For example, a USE attribute that is bound only to the domain name of a URL. (Its the system's responsibilty to find the domain name, not the profile definition.) For example the following URLs would have the following USE attribute values computed a.b a.b a.b/url.stuff a.b http://a.b/url.stuff a.b x.y.a.b x.y.a.b But still its not possible to do a right anchored word based search. (That is a 'last-words-in-field' operator.) Character level truncation I think is a mistake as a.b will match aa.b. In summary: * The best way to do it I think is to have a USE attribute bound only to the domain name, plus a new 'last-words-in-field' operator. But no-one supports this today I think. * The practical way to do it I think is to use adjacency with words (which is what everyone else is saying I believe). It will give false matches sometimes, but will work today. Alan
Received on Wednesday, 15 January 2003 20:49:29 UTC