Re: About sniffing from noah_mendelsohn@us.ibm.com on 2010-03-22 (www-tag@w3.org from March 2010)

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 22 Mar 2010 15:22:56 -0400
To: John Kemp <john@jkemp.net>
Cc: "www-tag@w3.org WG" <www-tag@w3.org>
Message-ID: <OF335C142A.E2BD57FE-ON852576EE.00684C19-852576EE.006A7A49@lotus.com>
> I'll update the agenda to list these and a link to this email

BTW, note that there are two copies of the agenda [1,2], and it's 
important that they both be edited when changes are made.  I haven't quite 
unscrambled the CVS conflicts I'm seeing, but I see some hints that the 
two copies got out of sync, perhaps due to your edits or others.  I 
believe that I have managed to reconstruct in both copies the edits you 
made, but there is some chance I didn't succeed, and in the future I can't 
commit that changes made in only one copy won't later be lost.  FYI, my 
usual procedure is usually to check out one copy, edit it, check it in, 
then copy that blindly over a checked out version of the other copy.  So, 
if you only edit one copy, you risk getting your edits backleveled, and 
there's a certainty that readers of the other copy won't see them.

Noah


[1] http://www.w3.org/2001/tag/2010/03/24-agenda.html
[2] http://www.w3.org/2001/tag/tag-weekly

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








John Kemp <john@jkemp.net>
Sent by: www-tag-request@w3.org
03/22/2010 11:17 AM
 
        To:     "www-tag@w3.org WG" <www-tag@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        About sniffing


Hello,

Below, I have written some suggested goals for our f2f discussion (I'll 
update the agenda to list these and a link to this email), and some notes 
from my recent re-reading of the Authoritative Metadata finding - 
http://www.w3.org/2001/tag/doc/mime-respect. I realize I can't make this 
"required reading" at this late stage, but I would suggest that people at 
least read my notes below, but preferably read the Authoritative Metadata 
finding itself to get a good background for this issue. 

I believe this email to be related to ACTION-399.

Regards,

- johnk

Sniffing discussion goals
-------------------------------

* Discuss what (if anything) can be done by the TAG to improve the 
situation of content-type mis-labeling errors and reporting.
* Discuss the requirements for a content-sniffing algorithm given the 
constraints discussed in Authoritative Metadata, and in relation to the 
content-sniffing draft proposed in 
http://tools.ietf.org/html/draft-abarth-mime-sniff-04 
* Establish any updates to Authoritative Metadata and Self-describing Web 
findings based on these discussions.
* Discuss other instances of sniffing, as noted by Larry in email to TAG:

"I think this general rule should apply to MIME 
types, HTML versions, charset labels and language
tags (four kinds of 'sniffing' currently covered
by the HTML document.)"

Reading Authoritative Metadata (AM)
-----------------------------------------------

Arguments *against* the summary of key points from AM finding:

i) Why should metadata in an "encapsulating container" be authoritative? 
What happens when the container is separated from the contained entity? 
What about publishing chains where mis-labelling occurs?
ii) Inconsistency between representation data and metadata is an error 
which MUST not be silently ignored. To make the situation better, we need 
to provide guidance that supports such correction - browser plugins that 
report inconsistencies to the origin server owner? Content-management 
system plugins that sniff uploaded content and report errors?
iii) Why must an agent not override content-type without user consent? 
Source view vs content view - when source is plain text and content is an 
interpretation of plain text it must be possible to display both...

"For Web architecture, a design choice has been made that metadata 
received in an encapsulating container MUST be considered authoritative" - 
why!? Section 3 attempts to describe why....

Why (summarized):

i) Make media types descriptive of intended interpretation, not just an 
indication of format.

This requires that media types are properly descriptive and registered 
accurately. This also doesn't deal with the mis-labeling problem (ie media 
type is there but doesn't accurately describe the proper interpretation. 

In order to make this true, servers should sniff and detect mislabeled 
content received from clients too. 

ii) If container metadata is not used, and sniffing is required, only one 
representation of the content is possible - thus container metadata MUST 
be possible. 

Agree with this

iii) Using the container metadata model allows easier dispatch to 
"handlers/plugins" without recourse to inspecting the message body

Agree with this

What to do when no metadata is supplied:

* If Content-type is EMPTY, UA MAY sniff

* If Content-type is application/octet-stream, UA should ask the user 
(this is not said in AM, but appears common convention - AM says: "Server 
managers (webmasters) SHOULD NOT specify an arbitrary Internet media type 
(e.g., "text/plain" or "application/octet-stream") when the media type is 
unknown.")

Servers and clients should be more circumspect about labeling content - 
and say "I don't know" (empty Content-type) more often. 

>From AM: "Instead of specifying a default for metadata, it is better for 
representations to be sent without that metadata. That allows the 
recipient to guess the metadata instead of being forced to either accept 
incorrect metadata or be tempted to violate Web architecture by ignoring 
it."

and...

"It is better to send no media type if the resource owner has failed to 
define one for a given representation."

Conclusion: Authoritative Metadata finding accurately describes the issues 
and does its best to give good guidance.
Received on Monday, 22 March 2010 19:23:40 UTC