Re: bib-1 truncation attribute 105: word masking

Ray Denenberg wrote:

>wald@library.ho.lucent.com wrote:
>
>  
>
>>  Realized better what is bothering me about this proposal.
>>The proposal says:
>>      1.  A single asterisk (*) is used to mask zero or more characters.
>>      4.  A single vertical bar (|) is used to mask zero or more words.
>>
>>Question is how is     a*b   different from   a|b
>>    
>>
>
>Suppose you search on "search" and you want to retrieve "amalgamated search"
>but not "amalgamated research". Then "*search"  won't help but "|search"
>will.
>
>--Ray
>
As far as I know, there is not a standard set of regular expressions in 
which some symbol can be used as the "|" in the proposal. If it exists, 
please pardon me for this message, and let me know about this set. 
Otherwise read the rest of the message.

In many UNIX commands, as well as in many high level programming 
languages, you can use "\<" to match left word boundaries (and "\>" to 
match right word boundaries). The example above should work with 
"\<search": this RE would match "amalgamated search" but not 
"amalgamated research".

The same result should be given with "\Wsearch" ("\W" matches any 
character which can't be part of a word).

There could well be situations in which "matching zero or more words" is 
not equivalent to "matching word boundaries", buf if they are very few 
or even don't exist, it would be nice to change the proposal speaking of 
word boundaries instead of whole words, because of the availability of 
standard solutions to handle word boundaries (I'm almost sure that Perl 
and Java support such kind of RE, for example).

Another note: how many DBMS' support word matching or word boundaries 
matching? As far as I know, standard SQL does not, for example. The 
proposal should take into account the actual possibility of implementing 
the 105 attribute with small effort.

Best regards.

-- 
Andrea Giuliano, Ph. D.
Virtual System Administrator
ICCU - Istituto Centrale per il Catalogo Unico
Viale Castro Pretorio 105, Rome - ITALY
Tel. +39064989509, Fax +39064059302

Received on Monday, 9 September 2002 05:13:00 UTC