- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Wed, 3 Apr 2002 19:10:34 +1000
- To: "LeVan,Ralph" <levan@oclc.org>
- Cc: "ZIG Mailing List (E-mail)" <www-zig@w3.org>
On Thu, Mar 28, 2002 at 09:47:10AM -0500, LeVan,Ralph wrote:
> The ZIG has added a type 104 truncation attribute that reflects the
> semantics from the Z39.50 CCL query. We have used that syntax internally at
> OCLC for many years now and have a problem with the syntax for the '?'
> character.
...
> The problem arises when a '?' is embedded inside a numeric term. There is
> no way to tell what digits relate to the '?' and what digits are part of the
> term.
I personally don't have any objection to your proposal. But I thought
I would mention some other issues that I have come across to see if
they are worth addressing or not.
I personally prefer for a feature to cover all cases, not have weird
edge conditions if it can be helped. The present definition has a
problem in that how do you search for text containing a ? character
where you want to use CCL truncation characters?
In a CCL query, we do this using restoration marks (double quotes ").
If any character appears inside double quotes, then it should be released
(two double-quotes in a row releases the double quotes character).
FIND abc? abc right truncated
FIND "abc?" abc? as literal text
FIND "abc"? abc right truncated
FIND ab?cd word with ab at front and cd at end
FIND "ab?cd" ab?cd as literal text
FIND "ab"?"cd" word with ab at front and cd at end
FIND "ab"?3"cd" word with ab at front and cd at end with at most
3 chars inbetween
FIND "ab"?"3cd" word with ab at front and 3cd at end
FIND "ab""cd" ab"cd as word
FIND a"b"c"d"e abcde as word
We also do (for information only - debatable if should be in proposal)
FIND 3 find all terms numerically equal to 3 (3, 3.0, 03 etc)
FIND "3" find 3, not 3.0 or 03 etc
I thought it was worth raising as a possible alternative to your proposal.
That is, add double quotes as a release mechanism to the 104 attribute
instead of limiting to 1 digit after a ?. Sending through the double
quotes allows '?' to be searched for as literal text in a string
(so you can mix CCL pattern match operators *and* the ?/# symbols in text).
Its a little more radical than your proposed change (double
quotes suddenly gets a meaning). But I am not sure if its that much
of an issue - because I suspect not many people have implemented 104
at all. Adding double quotes means all possible patterns can be
expressed unambiguously - the current 104 explicitly prohibits
searching for ? and # as text (they are always operators - there is
no release mechanism). Adding meaning to double quotes solves your
problem and more.
Alan
Received on Wednesday, 3 April 2002 04:28:27 UTC