- From: Robert Hazeltine <rhazltin@zeppo.nepean.uws.edu.au>
- Date: Fri, 22 Dec 1995 11:11:57 +1000 (EET)
- To: Martijn Koster <m.koster@webcrawler.com>
- Cc: Davide Musella <davidmsl@anti.tesi.dsi.unimi.it>, Mailing list di html <www-html@w3.org>, musella@dsi.unimi.it
Hi Martijn, On Thu, 21 Dec 1995, Martijn Koster wrote: > At 4:06 PM 12/20/95, Davide Musella wrote: > >Hello to everybody.. Here there is the new version of the meta-tag draft. > > This one keeps going in various form(u)s :-) > > First of all I don't think HTML tags are the ideal place for generalised > meta-information about documents, because it's limited to HTML, only allows > a single viewpoint, etc. I'd much prefer seeing a URC, or at least the use > of LINK elements to point to separate documents with META data (as > suggested by Murray Maloney), which you can then negotiate. > > Having said that, it seems to refuse to die, and it is targetted as a > quickly deployable interrim measure, so I thought I'd better comments on > parts I disagree with (summary at the end). There may be some confusion as to the reason why this will not go away. And I do not think that it has anything to do with with being a quick fix for URC and related retrival issues. There is, in my view, a cogent need for document identification much the same as a book or a pamphlet is identified by a front page with title, author and imprint information. Hackers have long worked out ways of identifying files and versions (like header information in data base files or MZ in DOS binaries or the information given up with the UNIX file command); this discussion should be about finding analogous conventions for text files that is human readable. Taking this approach puts the challenge for analysist and programmers like yourself into a framework. While it remains unsolved you are going to get documents like the "Dublin Core" which basically reproduces a catalogue and/or document delivery system continuing to turn up like a bad penny. Since the vast majority of documents on the Internet are not produced or managed by librarians and their ilk, the final solution should be an agreed minimal list of elements. I think that Davide should be commended for trying to identify a list and bring some sensible resolution of a problem identified by lots of people. A list of such elements in the HTML3 spec is not going to detract from the work on URC or whatever is the latest attempt on document retrieval. I would not even attempt to guess how many "index.html" files there are in the word or associated symlinks or the shtml variations. As copy commands change file dates that cannot be relied on a an identifier. Having taken just two elements that hackers would use to identify application files, where do you look for adequate identification. > > 3. HTTP-EQUIV. > > I'd much prefer to change the focus of this draft away from HTTP-EQUIV, > and concentrate on NAME, to which http-equiv might be added only if > required. That way you separate the two purposes: embedding META information > in HTML, and associating HTTP headers with HTML documents. > As a matter of drafting technique, it would be better to identify HTTP-EQUIV (because they have their own value independent of HTML) and then go on to cover the HTML stuff. > The draft contains no rationale for sending generalised META info in HTTP, > so let's think about this... I can think of a few reasons: > > 1: to allow retrieval of this info via a HEAD request > > One could argue this is useful for indexing, but in practice robots don't > do this: they want the entire document before deciding what/how to index > it, at which point they can parse the HTML and use the META info straight > away from there. Also, because of this not being widely implemented doing > a HEAD instead of a GET would usually result in receiving no META data, and > requiring a GET after all; this double rounde-trip is enough reason to > just. do a GET, in which case you might as well parse it from the HTML > <HEAD> element. I find the logic that, because something is not the practice, it should not be part of the specification a bit strange. On the point that rationale is included in a specification is, in my view a failing of internet specification. There are too many specs that read more like a thesis than a specification, something which detracts from the inherent value that specification. If absolutely necessary to do something along the lines of including rationales, they should only be mentioned as a preamble or object statement. There would probably be less useless debate about what is *really* meant like the current discussion on periods being part of a URL. Rob...
Received on Thursday, 21 December 1995 19:14:33 UTC