Re: Meta Tag Draft - New version.

Robert Hazeltine (rhazltin@zeppo.nepean.uws.edu.au)
Fri, 22 Dec 1995 11:11:57 +1000 (EET)


Date: Fri, 22 Dec 1995 11:11:57 +1000 (EET)
From: Robert Hazeltine <rhazltin@zeppo.nepean.uws.edu.au>
To: Martijn Koster <m.koster@webcrawler.com>
Cc: Davide Musella <davidmsl@anti.tesi.dsi.unimi.it>,
Subject: Re: Meta Tag Draft - New version.
In-Reply-To: <v02140802acff54cb00f0@[199.221.45.139]>
Message-Id: <Pine.A32.3.91.951222101533.22873A-100000@zeppo.nepean.uws.edu.au>

Hi Martijn,
On Thu, 21 Dec 1995, Martijn Koster wrote:

> At 4:06 PM 12/20/95, Davide Musella wrote:
> >Hello to everybody.. Here there is the new version of the meta-tag draft.
> 
> This one keeps going in various form(u)s :-)
> 
> First of all I don't think HTML tags are the ideal place for generalised
> meta-information about documents, because it's limited to HTML, only allows
> a single viewpoint, etc.  I'd much prefer seeing a URC, or at least the use
> of LINK elements to point to separate documents with META data (as
> suggested by Murray Maloney), which you can then negotiate.
> 
> Having said that, it seems to refuse to die, and it is targetted as a
> quickly deployable interrim measure, so I thought I'd better comments on
> parts I disagree with (summary at the end).

There may be some confusion as to the reason why this will not go away.  
And I do not think that it has anything to do with with being a quick fix 
for URC and related retrival issues.

There is, in my view, a cogent need for document identification much the 
same as a book or a pamphlet is identified by a front page with title, 
author and imprint information.  Hackers have long worked out ways of 
identifying files and versions (like header information in data base 
files or MZ in DOS binaries or the information given up with the UNIX 
file command);  this discussion should be about finding analogous 
conventions for text files that is human readable.

Taking this approach puts the challenge for analysist and programmers 
like yourself into a framework.  While it remains unsolved you are going 
to get documents like the "Dublin Core" which basically reproduces a 
catalogue and/or document delivery system continuing to turn up like a 
bad penny.

Since the vast majority of documents on the Internet are not produced or 
managed by librarians and their ilk, the final solution should be an 
agreed minimal list of elements.  I think that Davide should be commended 
for trying to identify a list and bring some sensible resolution of a 
problem identified by lots of people.

A list of such elements in the HTML3 spec is not going to detract from 
the work on URC or whatever is the latest attempt on document retrieval.

I would not even attempt to guess how many "index.html" files there are 
in the word or associated symlinks or the shtml variations.  As copy 
commands change file dates that cannot be relied on a an identifier.  
Having taken just two elements that hackers would use to identify 
application files, where do you look for adequate identification.

> > 3. HTTP-EQUIV.
> 
> I'd much prefer to change the focus of this draft away from HTTP-EQUIV,
> and concentrate on NAME, to which http-equiv might be added only if
> required. That way you separate the two purposes: embedding META information
> in HTML, and associating HTTP headers with HTML documents.
> 
As a matter of drafting technique, it would be better to identify 
HTTP-EQUIV (because they have their own value independent of HTML) and 
then go on to cover the HTML stuff.

> The draft contains no rationale for sending generalised META info in HTTP,
> so let's think about this... I can think of a few reasons:
> 
> 1: to allow retrieval of this info via a HEAD request
> 
>    One could argue this is useful for indexing, but in practice robots don't
>    do this: they want the entire document before deciding what/how to index
>    it, at which point they can parse the HTML and use the META info straight
>    away from there.  Also, because of this not being widely implemented doing
>    a HEAD instead of a GET would usually result in receiving no META data, and
>    requiring a GET after all; this double rounde-trip is enough reason to
>    just.  do a GET, in which case you might as well parse it from the HTML
>   <HEAD> element.

I find the logic that, because something is not the practice, it should 
not be part of the specification a bit strange.

On the point that rationale is included in a specification is, in my view 
a failing of internet specification.  There are too many specs that read 
more like a thesis than a specification, something which detracts from 
the inherent value that specification.

If absolutely necessary to do something along the lines of including 
rationales, they  should only be mentioned as a preamble or object 
statement.  There would probably be less useless debate about what is 
*really* meant like the current discussion on periods being part of a URL.

Rob...