- From: Jacob Hallén <jacob@netg.se>
- Date: Sun, 23 Sep 2001 17:44:30 +0200 (CEST)
- To: Johan Zeeman <joe.zeeman@tlcdelivers.com>
- cc: Leif Andresen <LEA@bs.dk>, BATH-PROFILE-L@INFOSERV.NLC-BNC.CA, www-zig@w3.org, "Majordomo danZIG (E-mail)" <danzig@list.dbc.dk>
I think the "first in field" and "first in subfield" raise a number of really ugly issues, especially in an international profile. The first one is that it makes assumptions about how the data is stored in an otherwise abstract search notation. If I store my data in the LIBRISMARC format, where use attribute X has the data stored in field X, subfield $a and field Y, subfields $a and $b, and the user then searches my database, trying to retrieve the data in MARC21 format, where attribute X normally is stored in field Z, subfield $c, that user will be seriously confused by what I consider to be first in field. Indeed, I am seriously confused about what I should consider to be first in field, or first in subfield. I may also have my data in a completely different format, and just generate MARC output on demand. My conclusion is that "first in field" and "first in subfield" only work within the context of a single MARC format, requiring the indices of the database to be adapted to that MARC format. I don't think such limitations are wanted in an international profile. It might be possible to address the problems by having some clear wording in the profile about how to handle the conversion between abstract and actual search entry points, but then we still have the problem that many search engines seem to be lacking the capability to handle word positions relative to the start and end of the string. I expect that there may be problems with things like filing marks and stop words as well. Stop words are not indexed, so will generate hits that are not in the set the user would expect. The user looks for a title starting with "Blue", and uses a Title Search - First words in field, expecting to get rid of all "The blue", "A blue", "In blue" etc. Again, some careful wording may clarify the situation. Jacob Hallén AB Strakt On Fri, 21 Sep 2001, Johan Zeeman wrote: > Leif; > > ----- Original Message ----- > From: "Leif Andresen" <LEA@bs.dk> > To: <BATH-PROFILE-L@INFOSERV.NLC-BNC.CA>; <www-zig@w3.org> > Cc: "Majordomo danZIG (E-mail)" <danzig@list.dbc.dk> > Sent: Friday, September 21, 2001 10:59 AM > Subject: Bath-profile: relation to national profiles...................... > ...... J.nr. 331-3 > > > <snip > > > > Position "first in field" (value 1) gives us real problems. Library > systems > > on the Danish market don't support the function "First Words in Field". We > > have discussed in and don't see the reason for this function. We use the > > combination: > > Position = Any > > Structure = Phrase > > Compleness = incomplete subfield > > for two ore more words in connection. The Danish tradition is to use SCAN > > for a "complete phrase" to support a user who knows the word/words to > start > > a title. > > It seems not sensible to demand new indexes where the use isn't obvious. > > > > It has been, and remains, my view that "first in field" was intended to be > synonymous with "field starts with" or "left-anchored". Similarly "first > in subfield" is synonymous with "subfield starts with". This is also the > view of the Bib-1 semantics document. Thus an operand with structure "word" > and position "first in field" should find those records in which the > appropriate field starts with the word in the term. And an operand with > structure "phrase" and position "anywhere in subfield" should find records > in which any subfield of the appropriate field contains anywhere in it the > phrase in the term. An operand with structure "phrase" and position "first > in field" does not mean that the words in the phrase need to be among the > first words in the field, but that the field needs to begin with the term. > > > This implies that, in order to specify a left-anchored phrase search (e.g. > names starting "Smith, John"), you need to specify that structure is a > "phrase" and position is "first in field". Just stating that a term is a > phrase (i.e. one or more words) does not tell the database anything about > how to match the term against its indexes - it is wrong to assume that > "phrase" means "left-anchored", just as it is wrong to assume that "word" > means "unbounded" or "process as keyword". And certainly omitting the > position attribute does not conform with the Bath principle that a value of > each attribute type is required. > > J. Zeeman > The Library Corporation > http://www.tlccarl.com > > NetGuide http://www.netg.se/ TerraTel AB jacob@netg.se Tankegĺngen 4 031 - 50 79 40 417 56 Göteborg
Received on Sunday, 23 September 2001 11:45:11 UTC