- From: Robert Burns <rob@robburns.com>
- Date: Fri, 31 Aug 2007 14:25:15 -0500
- To: Roy T.Fielding <fielding@gbiv.com>
- Cc: "public-html@w3.org WG" <public-html@w3.org>
HI Roy, On Aug 31, 2007, at 1:20 PM, Roy T. Fielding wrote: > On Aug 31, 2007, at 10:59 AM, Robert Burns wrote: >> On Aug 31, 2007, at 12:31 PM, Roy T. Fielding wrote: >>> On Aug 31, 2007, at 8:01 AM, Robert Burns wrote: >>>>> One of the main reasons for this is because the W3C hasn't made >>>>> it clear to developers and browser manufacturers that it's the >>>>> media-type ("application/xhtml+xml") that people need to get >>>>> used to, not just the XML syntax of XHTML, and it's the media- >>>>> type that makes the document XHTML. >>>> >>>> We've been discussing this at length on the "review of content >>>> type rules by IETF/HTTP community" thread (see also the wiki >>>> page [1]). I think a more accurate way to think of it is that a >>>> file's type is determined by the internals of the file and the >>>> authoring tool. >>> >>> No, that is the completely wrong way to think of it. Media types >>> define how a given sequence of bytes are intended to be processed >>> by the recipient. I can author dozens of types in vim. It is >>> impossible to determine the media type of content by sniffing. >>> It is sometimes possible to determine a range of possible media >>> types and pick one based on configuration, but there are always >>> exceptions that will cause such a pick to be wrong. >> >> I'm not sure what I said conflicts with what you're saying. My >> point is that an author and the tool the author uses creates a >> file of a certain type (even before it reaches an HTTP server). No >> sniffing is necessary at this stage because the author and >> authoring tool combination already know the type of file they're >> creating. As you said "I can author dozens of types in vim". And >> you are the one in charge of deciding what type you're authoring. >> You may be saving it to disk with each edit and each time the HTML >> file you're authoring is made available as a PNG file through an >> http daemon. Does that misconfigured server say anything about the >> file type you're authoring in vim? No and it shouldn't > > That is still wrong. Media Type != Data Format. Authoring tools know > data formats (at least supersets, like text/*). Authoring tools never > know HTTP's value for Content-Type. Never. I'm trying to understand what you're saying, but you're using many different terms here. • "Media Type != Data Format" OK, however data formats are often expressed through media types, right? • "Authoring tools know data formats (at least supersets, like text/ *)". Isn't text/* a media type. So here the authoring tool knows the data format as expressed as a media type like "text/plain". Also for an authoring tool that authors only HTML (not plain text) wouldn't that data format be expressed as the media type "text/html". So if data formats are expressed with the same names as media types, where is the difference. Is media type only about expressing how the author wants the data format handled (e.g, as text/plain instead of text/ html). However then I think we're missing a place for metadata that expresses the files' data format (and allows that to be efficiently retrieved over the network). • "Authoring tools never know HTTP's value for Content-Type." Here I think is the problem. Its the HTTP content-type that should be set based on the author's and the authoring tool's specification and therefore there's no reason for the authoring to to know the HTTP's value. Rather the HTTP content-type value should be dependent on the author / authoring tool determination. > You are thinking of Content-Type as a data format. That is not its > purpose in MIME and HTTP. Would you say that media types can express a data format, but that MIME and HTTP instead use them to express the author's desired handling of the data format. >>> If you are going to make rules for sniffing, you need to be honest >>> about the nature of that beast -- no matter what you define, it will >>> be wrong some percentage of the time. It is the user's choice to >>> determine when that is acceptable, not the choice of a standard. >> >> Sniffing is certainly a problem. However, browsers vendors are >> finding sniffing to be more reliable than content-type headers. So >> there's problems with sniffing and there's problems in the process >> of affixing and retaining the author/authoring tool intended media >> type to a file. > > No, sniffing is impossible, and the authoring tool doesn't know the > intended media type. However, the authoring tool, along with the author, is in the best position to know the intended media type. A big part of the problem is that frequently author != server administrator. If we want to create a seamless process from author to consumer that passes through a network, there needs to be a better way of expressing the media type in the authoring process that can be retained throughout delivery until it reaches the final consumer of that authored content. Filename extensions might be used, but the filename extension cannot always express both the data format for a file and its author-intended handling (as might be expressed in the HTTP Content-Type header). > Media types are a protocol issue that is > related to the data format, but every data format has at least three > overlapping potential media types (and usually much more than three, > since the extension space for media types is bounded only by string > sizes). Could you provide an example of these overlapping potential media types. I'm not following you here. > The only way that a media type can be assigned is when a > human makes a choice, by various configuration mechanisms, to assign > such a type. DefaultType is one such choice -- it only becomes a bug > when authors are ignorant of the configuration choices, which in turn > is a direct result of sniffing in silence. Part of the problem here is thinking that an author and the server admin are the same person. Authors may create content which then gets distributed in all sorts of ways beyond their control. Each time the authored content changes hands, there's an opportunity to lose the metadata that accompanies the file: for the author's intentions to be lost. If changing the data handling (as opposed to the data format) of a file is important, then we should find some better way to retain the metadata with the file content. Add to this that *nix has evolved beyond the simple filesystems it once had and it is clear that not every file without a filename extension should necessarily be treated as text file. More importantly though, a server shouldn't even be configurable to give a catch-all response when the Content-Type is unknown (when either server-side MIMEMagic sniffing or through a filename extension or any other method it uses to determine the Content-Type value fails). This is especially true since it is impossible to determine whether the filename extension metadata is missing or it is a null filename extension indicating "text/plain" (and the server also makes use of DefaultType for unknown extensions too which it treats as null extensions instead). Since servers are often repositories for large and diverse groups of users, it is inevitable that files will get loaded without known filename extensions (since we just don't have decent protocols in place to ensure these things). If every upload/ save operation to a filesystem in any protocol required a consistent way to store metadata (and one not as fragile, and decentralized as filename extensions), then we might expect servers to never have insufficient Content-Typae information. Since that's not eh case, the server has to allow for this missing metadata. I think this underscores one of the reasons I don't particularly like the term "media type". It contributes to this ambiguity (it also is easily confused with media description where the term media in each case have very different meanings as far as I can tell). If a media type can be used to express a data format and it can also be used to express a Content-Type, then this language does nothing to create clarity in the conversations about the topic. We just get this dizzying array of terms that contributes to everyone talking past one another. Take care, Rob
Received on Friday, 31 August 2007 19:25:52 UTC