Re: Glossaries (was: footnotes in HTML3...)

Michael Johnson (michaelj@relay.relay.com)
Fri, 30 Jun 95 07:48:20 EDT


Subject: Re: Glossaries (was: footnotes in HTML3...)
Message-Id: <MICHAELJ.950630074820@relay.relay.com>
From: michaelj@relay.relay.com (Michael Johnson)
To: kerog@sp.isl.secom.co.jp (Keith Rogers)
Date:    Fri, 30 Jun 95 07:48:20 EDT

>At  9:43 AM 95.6.7 -0400, Michael Johnson wrote:
>>Keith Rogers writes:
>>>I assume that the glossary link is described in some other
>>>document.  Could you tell me where I could find this information?
>>
>>Check out http://www.hpl.hp.co.uk/people.dsr/html3/CoverPage.html which is a
>>hypertext version of Dave Raggett's HTML 3.0 IETF draft.
>
>First of all, the correct URL is
>
>http://www.hpl.hp.co.uk/people/dsr/html3/CoverPage.html

Sorry, finger check.

>Second of all, as far as I can tell this document makes no mention of
>glossaries.
 [chomp]
>This is in the section on hypertext links.  Is there some other section
>which describes glossaries?

Look in the section on the HEAD element. It describes the elements that
can be in a HEAD element, including LINK, and gives a list of standard
relations.

>I hope I understand your example properly.  I believe you were suggesting
>that a script exist on the machine containing the glossary to handle
>individual requests.

That's one possible implementation.

>                      This would (a) require that there be a server capable
>of handling such requests on the computer holding the glossary, somthing that
>not everyone has access to;

I don't know of ANY http server implementations that can't run CGI scripts.
If the server can serve the original document, it can run a script.

>                            (b) require the html author to know the proper
>script for each individual glossary referenced; and (c) necessitate a new
>connection for every search.

Neither of which is that big a deal. A single well-written script could be
used for many documents. Connections are cheap and getting cheaper.

>(a) and (b) take a relatively simple concept and make it very obscure for
>the average user.  Given a proper standard, this work could be done by
>the browser and hidden from the authors of both the glossary and the
>referencing html author.

What can I say? I disagree that there's anything obscure about it. It seems
to me that the current situation constitutes a sufficient standard.

>As far as (c) is concerned, this is appropriate behavior for a *dictionary*
>but not, I believe, for a glossary.  The point of a glossary would be to
>have a smaller list of vocabulary specific to a given subject.  Since it
>would be small, it could be downloaded in its entirety with the document
>and then searched at leisure. Requiring a separate query over the internet
>for each term investigated would be 1) prohibitively slow for many users
>and 2) unnecessarily burdensome on the glossary provider.

So it's OK to burden the server with lots of dictionary requests, but not
lots of glossary requests? How often do YOU actually look at the glossary of
a printed document? I doubt there would be an unacceptable load. And, since
each glossary entry would presumably be small, there should be little problem
with response time.

>A glossary is a very common, accepted form of information.  It's not
>unreasonable to make some sort of provision for it in HTML.  If left
>as a "browser issue" it will never be implemented on a wide basis.

The provision has been made. Any additional restrictions are outside the
scope of the HTML standard. Why limit the creativity of information providers?

>>>should there be some way for an author to specify directly
>>>which glossary he wants to reference for a particular word?
>
>>It seems to me that an author should not overload an acronym in a document,
>
>No, there should not be two possible interpretations of an acronym within
>a document.  However, when referencing a glossary one might encounter two
>such intepretations.

[example chomped]

>             Though you might in general prefer to reference the networking
>one first, in this case you would want to reference the publishing one.

So have the well-written script be able to reference a reasonably simple rule
file that would make this differentiation on the server side. Why bother to
download all that information to the client?

I could picture a script which would be referenced by URLs of the following
sort:

  http://www.somewhere.net/cgi-bin/glossary/~joe/document1?word
  http://www.somewhere.net/cgi-bin/glossary/~karen/tech-manual?word
  http://www.somewhere.net/cgi-bin/glossary/~support/installation?word

The glossary script would be kept on the server, and would be referenced by
all authors who want to provide a glossary. The additional path information
after the word "glossary" would be used, by the script, to find the rules
file, read it, and figure out how to resolve the glossary term. Presumably
this would involve finding an appropriate glossary file and extracting the
HTML for the glossary entry.

You can specify a standard for the glossary rules and the format of the
glossary files without getting HTML itself involved. And you can design it
so that it is flexible and easy to understand.

You could even design the script so that if it receives a request of the
form:

  http://www.somewhere.net/cgi-bin/glossary/~joe/document1

i.e. with no query appended, that it would go out, read the rules file,
construct a complete glossary from the various referenced glossary files,
and send this back to the requester. That would allow your "smart" browser
to do local searches for words.

Flexible, powerful, easy for authors, and no need to mess with HTML.

Michael Johnson
Relay Technology, Inc.