Re: Proposed change to Truncation attribute 104

On Thu, Mar 28, 2002 at 09:47:10AM -0500, LeVan,Ralph wrote:
> The ZIG has added a type 104 truncation attribute that reflects the
> semantics from the Z39.50 CCL query.  We have used that syntax internally at
> OCLC for many years now and have a problem with the syntax for the '?'
> character.
...
> The problem arises when a '?' is embedded inside a numeric term.  There is
> no way to tell what digits relate to the '?' and what digits are part of the
> term.

I personally don't have any objection to your proposal. But I thought
I would mention some other issues that I have come across to see if
they are worth addressing or not.

I personally prefer for a feature to cover all cases, not have weird
edge conditions if it can be helped. The present definition has a
problem in that how do you search for text containing a ? character
where you want to use CCL truncation characters?

In a CCL query, we do this using restoration marks (double quotes ").
If any character appears inside double quotes, then it should be released
(two double-quotes in a row releases the double quotes character).

    FIND abc?		abc right truncated
    FIND "abc?"		abc? as literal text
    FIND "abc"?		abc right truncated
    FIND ab?cd		word with ab at front and cd at end
    FIND "ab?cd"	ab?cd as literal text
    FIND "ab"?"cd"	word with ab at front and cd at end
    FIND "ab"?3"cd"	word with ab at front and cd at end with at most
    				3 chars inbetween
    FIND "ab"?"3cd"	word with ab at front and 3cd at end
    FIND "ab""cd"	ab"cd as word
    FIND a"b"c"d"e      abcde as word

We also do (for information only - debatable if should be in proposal)

    FIND 3		find all terms numerically equal to 3 (3, 3.0, 03 etc)
    FIND "3"		find 3, not 3.0 or 03 etc

I thought it was worth raising as a possible alternative to your proposal.
That is, add double quotes as a release mechanism to the 104 attribute
instead of limiting to 1 digit after a ?. Sending through the double
quotes allows '?' to be searched for as literal text in a string
(so you can mix CCL pattern match operators *and* the ?/# symbols in text).

Its a little more radical than your proposed change (double
quotes suddenly gets a meaning). But I am not sure if its that much
of an issue - because I suspect not many people have implemented 104
at all. Adding double quotes means all possible patterns can be
expressed unambiguously - the current 104 explicitly prohibits
searching for ? and # as text (they are always operators - there is
no release mechanism). Adding meaning to double quotes solves your
problem and more.

Alan

Received on Wednesday, 3 April 2002 04:28:27 UTC