- From: Martijn Koster <m.koster@webcrawler.com>
- Date: Thu, 21 Dec 1995 13:30:32 -0700
- To: davidmsl@anti.tesi.dsi.unimi.it (Davide Musella), www-html@w3.org (Mailing list di html)
- Cc: musella@dsi.unimi.it
At 4:06 PM 12/20/95, Davide Musella wrote: >Hello to everybody.. Here there is the new version of the meta-tag draft. This one keeps going in various form(u)s :-) First of all I don't think HTML tags are the ideal place for generalised meta-information about documents, because it's limited to HTML, only allows a single viewpoint, etc. I'd much prefer seeing a URC, or at least the use of LINK elements to point to separate documents with META data (as suggested by Murray Maloney), which you can then negotiate. Having said that, it seems to refuse to die, and it is targetted as a quickly deployable interrim measure, so I thought I'd better comments on parts I disagree with (summary at the end). > Now the synopsis of the META HTTP-EQUIV Tag is not severe, allowing so > the use of different key words to define the same things. I had to read this twice before I understood. This may be better re-phrased as "Currently (in HTML 2.0) the synopsis is not well-defined". > 3. HTTP-EQUIV. I'd much prefer to change the focus of this draft away from HTTP-EQUIV, and concentrate on NAME, to which http-equiv might be added only if required. That way you separate the two purposes: embedding META information in HTML, and associating HTTP headers with HTML documents. The draft contains no rationale for sending generalised META info in HTTP, so let's think about this... I can think of a few reasons: 1: to allow retrieval of this info via a HEAD request One could argue this is useful for indexing, but in practice robots don't do this: they want the entire document before deciding what/how to index it, at which point they can parse the HTML and use the META info straight away from there. Also, because of this not being widely implemented doing a HEAD instead of a GET would usually result in receiving no META data, and requiring a GET after all; this double rounde-trip is enough reason to just. do a GET, in which case you might as well parse it from the HTML <HEAD> element. 2: to allow proxies and client access to this without parsing the HTML That is lazy, and current wisdom says servers should not have to do this stuff if the client can do it. 3: to set/override HTTP server settings One could e.g. put in a HTTP-EQUIV="Content-Language" header, if your server does support HTTP-EQUIV, but not configuration of Cantent-language. This would be a computationally expensive way of doing it, even if the server caches this info. Much better to do that out of band. Dubious from a security aspect too (eg. Location:) 4: to get HTTP-headers via other protocols (file://, gopher://...) I find this dubious: - if you have conflicting values in your server configuration and your HTTP-Equiv, what is a server to do? What is a client to do? - I doubt many browser's architectures allow this to be done easily. - this only works for HTML. Why not solve it properly in the protocol so it works for all media types? So I don't think it's a good idea. If there is a good reason it should be mentioned in the draft. > It is possible to use any text string [in http-equiv], No, not if this is to go straight over the wire in a HTTP header: you're syntactically constrained by the HTTP spec here. The constraints should be explicitly mentioned, or at the very least referenced. Of course if you do just NAME and CONTENT you can relax this. This also opens up possible clashes in the name spaces between independent extensions by the HTML META tag and the HTTP spec. For example, if someone puts in a http-equiv "Payment: loads money", and later the HTTP spec decides to add a tag "Payment" with a rigorous syntax, then you have servers sending bogus headers. For this reason I'd be a lot happier if you were required to prepend "Meta-" to any unspecified string. I'd be even happier if general strings weren't allowed at all (there is little point if the syntax and semantics aren't defined) > but if you want to define > these properties you have to use the following words: > > ... > expire: to indicate the expire date of the document > language: to indicate the language of the document Without a syntax you can't do much with those fields... Anyway this is strange; in HTTP the respective headers are "Expires" and "Content-Language", and have pre-determined syntaxes. It seems to me these should be the same, and for them to be used the syntax and semantics need to be mentioned or referenced. > public (Boolean): to indicate if the document is available to > everybody or not First of all I question the value of this; wether a document is public or not should be determined by the protocol or policy, not the document. What if a browser sese a Public: 'NO', should it drop the document? You can't guarantee that. You also can't specify target groups who have access, so as a security measure it's not all that valuable. Anyway, if you specify a type 'Boolean' you need to specify the values (0-1, YES-NO, ON-OFF ?), otherwise it doesn't really help. > An HTTP server must process these tags for an HEAD HTTP request, > Do not name an HTTP-EQUIV attribute the same as a response header > that should typically only be generated by the HTTP server. Some > inappropriate names are "Server", "Date", and "Last-Modified". > Whether a name is inappropriate depends on the particular server > implementation. It is recommended that servers ignore any META > elements that specify HTTP equivalents (case insensitively) to their > own reserved response headers. This brings me to a feeling of unease about this draft: Is it a way of associating meta-data info with a document, or is it a way of configuring a server, or conveying HTTP info in HTML via other protocols? Don't these conflict somewhere? > 4. NAME. > > This attributes can be used to define some properties such as > author, publication date etc. If absent the name can be assumed to be > the same as the value of HTTP-EQUIV. According to section 3, the HTTP-Equiv may also be absent, so both may be absent, leaving the Content useless :-) Either should be required. Personally I'd prefer the emphasis to be on the NAME=>CONTENT pair rather than the HTTP-EQUIV=>CONTENT pair. > 5. CONTENT > > Used to supply a value for a named property. > If it's used with the HTTP-EQUIV it can contain more than one single > information; it is possible to use the Boolean operator (AND, OR) to > insert a Boolean definition of the field. > The AND operator will be represented by the SPACE (ASCII[32]) and the > OR operator by the COMMA (ASCII[44]). > The AND operator is processed before the OR operator. So a string > like this: "Red ball, White ball" means :"ball AND (red OR white)". > Examples: > > <META HTTP-EQUIV= "Keywords" CONTENT= "Italy Product, Italy Tourism"> > > The spaces between a comma and a word or vice versa are ignored. I find this strange and confusing. First of all, this holds only true for those fields you have defined in section 3, not for HTTP headers. Secondly, why not simply say "Keyword phrases are separated by commas?" without delving into a non-obvious boolean system? > 6. Cataloging an HTML document > > These 'keywords' were specifically conceived for exaustively and > completely catalogue the HTML document. I guess you mean "exhaustively" and "cataloguing (?)"? I don't think you should claim anything "exhaustive and complete", because things especially meta-data never are. > This allows the software agents to index at best your own document. "This allows you to aid web robots in indexing your document."? > To do a preliminary indexing, it's important to use at least the > http-equiv meta-tag "keywords". This sentence doesn't run... I'm also missing a "Security Considerations" section, which seems very needed to warn about people spamming and abusing this tag, especially when it could override HTTP-proper headers. Sorry to be a bit negative here, but I really think this should be well thought-out if it is to end up in a spec the entire networking community will have to live with. So in summary: rather that (just) this meta tag, look at using LINK to associate META data, seriously reconsider (euphemism for "don't do") general HTTP-EQUIV, specify syntax as well as semantics for the fields, and consider the security issues. -- Martijn Email: m.koster@webcrawler.com WWW: http://info.webcrawler.com/mak/mak.html
Received on Thursday, 21 December 1995 16:31:02 UTC