W3C home > Mailing lists > Public > www-zig@w3.org > September 2001

Re: Bath-profile: relation to national profiles...................... ...... J.nr. 331-3

From: Jacob Hallén <jacob@netg.se>
Date: Sun, 23 Sep 2001 17:44:30 +0200 (CEST)
To: Johan Zeeman <joe.zeeman@tlcdelivers.com>
cc: Leif Andresen <LEA@bs.dk>, BATH-PROFILE-L@INFOSERV.NLC-BNC.CA, www-zig@w3.org, "Majordomo danZIG (E-mail)" <danzig@list.dbc.dk>
Message-ID: <Pine.LNX.3.96.1010923170730.19863A-100000@valdez.netg.se>
I think the "first in field" and "first in subfield" raise a number of
really ugly issues, especially in an international profile.

The first one is that it makes assumptions about how the data is stored in
an otherwise abstract search notation. If I store my data in the
LIBRISMARC format, where use attribute X has the data stored in field X,
subfield $a and field Y, subfields $a and $b, and the user then searches
my database, trying to retrieve the data in MARC21 format, where attribute
X normally is stored in field Z, subfield $c, that user will be seriously
confused by what I consider to be first in field. Indeed, I am seriously
confused about what I should consider to be first in field, or first in
subfield.

I may also have my data in a completely different format, and just
generate MARC output on demand.

My conclusion is that "first in field" and "first in subfield" only work
within the context of a single MARC format, requiring the indices of the
database to be adapted to that MARC format. 

I don't think such limitations are wanted in an international profile.
It might be possible to address the problems by having some clear wording
in the profile about how to handle the conversion between abstract and
actual search entry points, but then we still have the problem that many
search engines seem to be lacking the capability to handle word positions
relative to the start and end of the string.

I expect that there may be problems with things like filing marks and stop
words as well. Stop words are not indexed, so will generate hits that are
not in the set the user would expect. The user looks for a title starting
with "Blue", and uses a Title Search - First words in field, expecting to
get rid of all "The blue", "A blue", "In blue" etc.

Again, some careful wording may clarify the situation.

Jacob Hallén
AB Strakt

 On Fri, 21 Sep 2001, Johan Zeeman wrote:

> Leif;
> 
> ----- Original Message -----
> From: "Leif Andresen" <LEA@bs.dk>
> To: <BATH-PROFILE-L@INFOSERV.NLC-BNC.CA>; <www-zig@w3.org>
> Cc: "Majordomo danZIG (E-mail)" <danzig@list.dbc.dk>
> Sent: Friday, September 21, 2001 10:59 AM
> Subject: Bath-profile: relation to national profiles......................
> ...... J.nr. 331-3
> 
> 
> <snip
> >
> > Position "first in field" (value 1) gives us real problems. Library
> systems
> > on the Danish market don't support the function "First Words in Field". We
> > have discussed in and don't see the reason for this function. We use the
> > combination:
> > Position = Any
> > Structure = Phrase
> > Compleness = incomplete subfield
> > for two ore more words in connection. The Danish tradition is to use SCAN
> > for a "complete phrase" to support a user who knows the word/words to
> start
> > a title.
> > It seems not sensible to demand new indexes where the use isn't obvious.
> >
> 
> It has been, and remains, my view that "first in field" was intended to be
> synonymous with "field starts with" or "left-anchored".  Similarly "first
> in subfield" is synonymous with "subfield starts with".  This is also the
> view of the Bib-1 semantics document.  Thus an operand with structure "word"
> and position "first  in field" should find those records in which the
> appropriate field starts with the word in the term.  And an operand with
> structure "phrase" and position "anywhere in subfield" should find records
> in which any subfield of the appropriate field contains anywhere in it the
> phrase in the term.  An operand  with structure "phrase" and position "first
> in field" does not mean that the words in the phrase need to be among the
> first words in the field, but that the field needs to begin with the term.
> 
> 
> This implies that, in order to specify a left-anchored phrase search (e.g.
> names starting "Smith, John"), you need to specify that structure is a
> "phrase" and position is "first in field".  Just stating that a term is a
> phrase (i.e. one or more words) does not tell the database anything about
> how to match the term against its indexes - it is wrong to assume that
> "phrase" means "left-anchored", just as it is wrong to assume that "word"
> means "unbounded" or "process as keyword".  And certainly omitting the
> position attribute does not conform with the Bath principle that a value of
> each attribute type is required.
> 
> J. Zeeman
> The Library Corporation
> http://www.tlccarl.com
> 
> 

        NetGuide                        http://www.netg.se/
        TerraTel AB                     jacob@netg.se
        Tankegĺngen 4                   031 - 50 79 40
        417 56 Göteborg
Received on Sunday, 23 September 2001 11:45:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:13:27 UTC