- From: Robert Burns <rob@robburns.com>
- Date: Fri, 31 Aug 2007 14:25:15 -0500
- To: Roy T.Fielding <fielding@gbiv.com>
- Cc: "public-html@w3.org WG" <public-html@w3.org>
HI Roy,
On Aug 31, 2007, at 1:20 PM, Roy T. Fielding wrote:
> On Aug 31, 2007, at 10:59 AM, Robert Burns wrote:
>> On Aug 31, 2007, at 12:31 PM, Roy T. Fielding wrote:
>>> On Aug 31, 2007, at 8:01 AM, Robert Burns wrote:
>>>>> One of the main reasons for this is because the W3C hasn't made
>>>>> it clear to developers and browser manufacturers that it's the
>>>>> media-type ("application/xhtml+xml") that people need to get
>>>>> used to, not just the XML syntax of XHTML, and it's the media-
>>>>> type that makes the document XHTML.
>>>>
>>>> We've been discussing this at length on the "review of content
>>>> type rules by IETF/HTTP community" thread (see also the wiki
>>>> page [1]). I think a more accurate way to think of it is that a
>>>> file's type is determined by the internals of the file and the
>>>> authoring tool.
>>>
>>> No, that is the completely wrong way to think of it. Media types
>>> define how a given sequence of bytes are intended to be processed
>>> by the recipient. I can author dozens of types in vim. It is
>>> impossible to determine the media type of content by sniffing.
>>> It is sometimes possible to determine a range of possible media
>>> types and pick one based on configuration, but there are always
>>> exceptions that will cause such a pick to be wrong.
>>
>> I'm not sure what I said conflicts with what you're saying. My
>> point is that an author and the tool the author uses creates a
>> file of a certain type (even before it reaches an HTTP server). No
>> sniffing is necessary at this stage because the author and
>> authoring tool combination already know the type of file they're
>> creating. As you said "I can author dozens of types in vim". And
>> you are the one in charge of deciding what type you're authoring.
>> You may be saving it to disk with each edit and each time the HTML
>> file you're authoring is made available as a PNG file through an
>> http daemon. Does that misconfigured server say anything about the
>> file type you're authoring in vim? No and it shouldn't
>
> That is still wrong. Media Type != Data Format. Authoring tools know
> data formats (at least supersets, like text/*). Authoring tools never
> know HTTP's value for Content-Type. Never.
I'm trying to understand what you're saying, but you're using many
different terms here.
• "Media Type != Data Format" OK, however data formats are often
expressed through media types, right?
• "Authoring tools know data formats (at least supersets, like text/
*)". Isn't text/* a media type. So here the authoring tool knows the
data format as expressed as a media type like "text/plain". Also for
an authoring tool that authors only HTML (not plain text) wouldn't
that data format be expressed as the media type "text/html". So if
data formats are expressed with the same names as media types, where
is the difference. Is media type only about expressing how the author
wants the data format handled (e.g, as text/plain instead of text/
html). However then I think we're missing a place for metadata that
expresses the files' data format (and allows that to be efficiently
retrieved over the network).
• "Authoring tools never know HTTP's value for Content-Type." Here
I think is the problem. Its the HTTP content-type that should be set
based on the author's and the authoring tool's specification and
therefore there's no reason for the authoring to to know the HTTP's
value. Rather the HTTP content-type value should be dependent on the
author / authoring tool determination.
> You are thinking of Content-Type as a data format. That is not its
> purpose in MIME and HTTP.
Would you say that media types can express a data format, but that
MIME and HTTP instead use them to express the author's desired
handling of the data format.
>>> If you are going to make rules for sniffing, you need to be honest
>>> about the nature of that beast -- no matter what you define, it will
>>> be wrong some percentage of the time. It is the user's choice to
>>> determine when that is acceptable, not the choice of a standard.
>>
>> Sniffing is certainly a problem. However, browsers vendors are
>> finding sniffing to be more reliable than content-type headers. So
>> there's problems with sniffing and there's problems in the process
>> of affixing and retaining the author/authoring tool intended media
>> type to a file.
>
> No, sniffing is impossible, and the authoring tool doesn't know the
> intended media type.
However, the authoring tool, along with the author, is in the best
position to know the intended media type. A big part of the problem
is that frequently author != server administrator. If we want to
create a seamless process from author to consumer that passes through
a network, there needs to be a better way of expressing the media
type in the authoring process that can be retained throughout
delivery until it reaches the final consumer of that authored
content. Filename extensions might be used, but the filename
extension cannot always express both the data format for a file and
its author-intended handling (as might be expressed in the HTTP
Content-Type header).
> Media types are a protocol issue that is
> related to the data format, but every data format has at least three
> overlapping potential media types (and usually much more than three,
> since the extension space for media types is bounded only by string
> sizes).
Could you provide an example of these overlapping potential media
types. I'm not following you here.
> The only way that a media type can be assigned is when a
> human makes a choice, by various configuration mechanisms, to assign
> such a type. DefaultType is one such choice -- it only becomes a bug
> when authors are ignorant of the configuration choices, which in turn
> is a direct result of sniffing in silence.
Part of the problem here is thinking that an author and the server
admin are the same person. Authors may create content which then gets
distributed in all sorts of ways beyond their control. Each time the
authored content changes hands, there's an opportunity to lose the
metadata that accompanies the file: for the author's intentions to be
lost. If changing the data handling (as opposed to the data format)
of a file is important, then we should find some better way to retain
the metadata with the file content.
Add to this that *nix has evolved beyond the simple filesystems it
once had and it is clear that not every file without a filename
extension should necessarily be treated as text file. More
importantly though, a server shouldn't even be configurable to give
a catch-all response when the Content-Type is unknown (when either
server-side MIMEMagic sniffing or through a filename extension or any
other method it uses to determine the Content-Type value fails). This
is especially true since it is impossible to determine whether the
filename extension metadata is missing or it is a null filename
extension indicating "text/plain" (and the server also makes use of
DefaultType for unknown extensions too which it treats as null
extensions instead). Since servers are often repositories for large
and diverse groups of users, it is inevitable that files will get
loaded without known filename extensions (since we just don't have
decent protocols in place to ensure these things). If every upload/
save operation to a filesystem in any protocol required a consistent
way to store metadata (and one not as fragile, and decentralized as
filename extensions), then we might expect servers to never have
insufficient Content-Typae information. Since that's not eh case, the
server has to allow for this missing metadata.
I think this underscores one of the reasons I don't particularly like
the term "media type". It contributes to this ambiguity (it also is
easily confused with media description where the term media in each
case have very different meanings as far as I can tell). If a media
type can be used to express a data format and it can also be used to
express a Content-Type, then this language does nothing to create
clarity in the conversations about the topic. We just get this
dizzying array of terms that contributes to everyone talking past one
another.
Take care,
Rob
Received on Friday, 31 August 2007 19:25:52 UTC