W3C home > Mailing lists > Public > www-tag@w3.org > March 2013

Re: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 04 Mar 2013 01:50:48 +0100
To: Robin Berjon <robin@w3.org>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <m3k7j8pmcerblomjpeasp3dd4e9qjocqma@hive.bjoern.hoehrmann.de>
* Robin Berjon wrote:
>The first relies on the infamous "sending text/html with the intent of 
>having it render as text/plain" example. I've been hearing that example 
>for a decade now. Apart from being devoid of technical motivation (since 
>you can use <plaintext>) is there a second example? Notably, are there 
>examples involving non-text media types?
>It seems to proceed on the assumption that a sender indicating multiple 
>interpretations for the same representation is a key architectural 
>feature. I think this begs the question: why? And assuming someone does 
>have a use case, is it worth the cost of requiring a content type on 
>every response and of introducing the sort of frailty that leads to 
>sniffing? I've been doing web hacking for something like 18 years by 
>now, including some stuff that I'm pretty sure would be considered 
>rather exotic, but using a different media type for the same 
>representation simply has never come up. Not even in a freak prototype.

As far as I can tell, the concern is over, when multiple interpretations
are possible, that the author can clarify which interpretation is inten-
ded. Plain text is a simple example because being "plain" it lacks other
identifying features, like the character encoding used for the document.

When I bought my first computer it was the norm that text files you made
on one computer do not work on the next one if you did anything unusual,
like typing my name into them, because different systems had different
local conventions (DOS and Windows on the same computer, for instance).

I would not want to write "#encoding=windows-1252" every time I make a
new text file, nor would I want to be bothered by a text editor add that
for me every time (and possibly hide it so I end up forgetting it when I
run the file through `grep` or other tools later); with that historical
perspective the Content-Type header is a considerable improvement.

(These problems suggest to me that you would rather put "identifying"
information in the filesystem, and as a consequence convey it through
network protocols at a higher level than "file contents", meaning you'd
end up with something like Content-Type even under ideal circumstances.)

Examples for binary types should be easy to come up with. Many formats
re-use the PKZIP format; you would not want applications to treat .jar,
.docx, .xpi, and any number of other formats as ordinary .zip files. And
you would want zip-compatible applications to treat self-extracting .exe
files as zip files (because you can make files that are .exe and .zip at
the same time, you can enable "novice" users to extract the archive, and
allow more experienced users to avoid running the .exe code).

>The second paragraph is simply untrue. Looking at the first bytes of a 
>payload to read a magic number or some such is not more expensive than 
>reading the media type. It is certainly less expensive than having to 
>read both the media type and the first few bytes because you know that 
>the media type will be broken.

How do you read the first bytes when the data is compressed and you do
not understand the compression scheme? An intermediary might want to
understand what kind of data is being transmitted, but you do not want
to have to upgrade the intermediary every time a new compression scheme
comes along. Does that happen often? How important is this feature? We,
especially the "we" from over 20 years ago, do not quite know...
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Monday, 4 March 2013 00:51:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:54 UTC