W3C home > Mailing lists > Public > www-tag@w3.org > February 2010

Re: ACTION-308 (part 2) Updates to 'The Self-Describing Web'

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 1 Feb 2010 17:42:54 -0500
To: John Kemp <john@jkemp.net>
Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org WG" <www-tag@w3.org>
Message-ID: <OF63CE5EBB.22FCF2E1-ON852576BD.007B4DDA-852576BD.007C77F3@lotus.com>
John Kemp wrote:

> On Jan 7, 2010, at 5:21 PM, noah_mendelsohn@us.ibm.com wrote:
> 
> > John Kemp wrote:
> > 
> >> On Jan 6, 2010, at 4:52 PM, noah_mendelsohn@us.ibm.com wrote:
> >> 
> >> [...]
> >> 
> >>> Furthermore, the draft text really doesn't explain how allowance for 

> >>> sniffing would change the rest of the SDW story.
> >> 
> >> And that was deliberate. I am not "allowing sniffing" so much 
> >> as saying, "if you are going to sniff then do it this way". I 
> >> didn't intend to change the meaning of the SDW story at all, or
> >> its relationship to the use of authoritative metadata.
> > 
> > I know, but that's what I'm unhappy about.  I think that once we even 
> > bring up the possibility, we should explain the implications.
> 
> Fair enough.

Good, thanks!

> > I would probably be happier with something close too:
> > 
> > <proposed>
> > For the Web to have the desirable properties described in 
> this finding, 
> > it's essential that content be served with a media-type that correctly 

> 
> I'd prefer 'accurately' to 'correctly' in the above.

That's fine.

> > labels its content, and likewise it's essential that user 
> agents such as 
> > browsers interpret the received data per the specifications for that 
> > media-type.
> > Unfortunately, there are many servers on the Web that are notproperly 
> > configured, and which serve incorrect Content-types.
> 
> Isn't it the case that some servers accept content labelled 
> incorrectly (or not all) by another author (or server) 

I assume so.

> and then simply serve it with the Content-type supplied by the 
> original author (or possibly attempt to "sniff" the content 
> themselves)? 

I guess the way I look at it is:  each server is responsible for the 
"accuracy" of the Content-types it supplies.  If I'm configuring such a 
server, and I'm tempted to "sniff" this mysterious content, then it's my 
job to be very, very sure that such sniffing will do the right thing in 
all cases I'll actually encounter.

So, if I happen to know something about the sources of this data that I'm 
relaying, and I'm really sure that everything that appears to be a jpeg 
really is, then it's fine for me to set the Content-type accordingly.  In 
fact, that's pretty close to what we're doing when we configure a 
.htaccess file to infer Content-types from server-side filename 
extensions;  we're taking responsibility for the fact that all *.html 
files really are HTML, that all .jpg files are JPEGs, and so on.

Conversely, ignorance is no excuse in my opinion.  If such sniffing causes 
you to mislabel content, it's a bug, and you should find some way to fix 
it.

> Should they then be configured to always send an 
> empty Content-type as a more honest admission that they don't 
> know whether the Content-type associated with the content by 
> some other party is accurate? 

I remain a bit unsure on the nuances in choosing between empty content 
type and application/octet-stream, but otherwise "yes".

> I'm happy making these changes modulo my comments above.

Good.  Let's see what other TAG members think.  Thank you.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








John Kemp <john@jkemp.net>
01/08/2010 08:19 AM
 
        To:     noah_mendelsohn@us.ibm.com
        cc:     Larry Masinter <masinter@adobe.com>, "www-tag@w3.org WG" 
<www-tag@w3.org>
        Subject:        Re: ACTION-308 (part 2) Updates to 'The 
Self-Describing Web'


Hi Noah,

On Jan 7, 2010, at 5:21 PM, noah_mendelsohn@us.ibm.com wrote:

> John Kemp wrote:
> 
>> On Jan 6, 2010, at 4:52 PM, noah_mendelsohn@us.ibm.com wrote:
>> 
>> [...]
>> 
>>> Furthermore, the draft text really doesn't explain how allowance for 
>>> sniffing would change the rest of the SDW story.
>> 
>> And that was deliberate. I am not "allowing sniffing" so much 
>> as saying, "if you are going to sniff then do it this way". I 
>> didn't intend to change the meaning of the SDW story at all, or
>> its relationship to the use of authoritative metadata.
> 
> I know, but that's what I'm unhappy about.  I think that once we even 
> bring up the possibility, we should explain the implications.

Fair enough.

> 
> 
> Your draft text is:
> 
> <original>
> As noted above, and for other reasons (such as content aggregation), it 
> may not be possible for a browser to reliably determine, via inspection 
of 
> a Content-Type HTTP header or other external metadata alone, the 
intended 
> interpretation of Web content. In such cases, a browser may inspect the 
> content directly (commonly known as "sniffing"). The consequences of 
such 
> an action are described in [AuthoritativeMetadata]. In particular, 
> sniffing Web content should only be done using an accepted and secure 
> algorithm, such as [BarthSniff].
> </original>
> 
> I would probably be happier with something close too:
> 
> <proposed>
> For the Web to have the desirable properties described in this finding, 
> it's essential that content be served with a media-type that correctly 

I'd prefer 'accurately' to 'correctly' in the above.

> labels its content, and likewise it's essential that user agents such as 

> browsers interpret the received data per the specifications for that 
> media-type.
> Unfortunately, there are many servers on the Web that are not properly 
> configured, and which serve incorrect Content-types.

Isn't it the case that some servers accept content labelled incorrectly 
(or not all) by another author (or server) and then simply serve it with 
the Content-type supplied by the original author (or possibly attempt to 
"sniff" the content themselves)? Should they then be configured to always 
send an empty Content-type as a more honest admission that they don't know 
whether the Content-type associated with the content by some other party 
is accurate? 

>  In particular,
> content intended to be interpreted as text/html, image/jpeg or other 
> common types is sometimes served as text/plain.
> Such incorrect labeling of content is contrary to Web architecture, and 
it 
> undermines many of the valuable Web characteristics described by this 
> finding.
> 
> Nonetheless, in part because such mislabeled content is common, certain 
> browsers and other user agents have been coded to guess or "sniff" the 
> intended content type, particularly for responses that are explicitly 
> typed as text/plain.  Such sniffing breaks the chain of accountability 
> described in this finding, making it more difficult for a user to hold 
the 
> publisher responsible for a document's contents.
> 
> Other negative consequences of sniffing are described in the 
> [AuthoritativeMetadata].  For example, "sniffing" can also expose the 
user 
> agent to security vulnerabilities;  these can to some degree be 
minimized 
> by using more secure algorithms, such as the ones described in 
> [BarthSniff].
> </proposed>

I'm happy making these changes modulo my comments above.

> 
> This might actually go in as a new, short Chapter 7 in SDW, I think. 
That 
> would bump the conclusions section to become #8.

That sounds fine to me.

Regards,

- johnk

> 
> 
>>> After all, we give 
>>> examples in which providers of data are held legally accountable for 
>>> having published certain content, precisely because the chain
>> of normative 
>>> specifications makes clear their correct interpretation.  In 
>> a world where 
>>> people start to "sniff", am I accountable for the (mis) 
>> interpretation of 
>>> something served as text/plain that just happens to resemble 
>> some other 
>>> media type?  The whole point of SDW is to tell stories like that.
>>> 
>>> So, I agree with Larry that we should steer clear of 
>> elevating sniffing to 
>>> being even a good practice at the architecture level (it's not a 
>>> "principle" in the sense of AWWW principles in any case); 
>> even if we do 
>>> want to acknowledge that widespread use of sniffing in practice in a 
>>> revised SDW, I think it behooves us to carefully explain how the core 
>>> stories about accountability and lack of ambiguity are affected.
>> 
>> I agree that it would be good to explain the ambiguity 
>> introduced by sniffing.
> 
> See above for a rough proposal
> 
>>> I think 
>>> we have two choices:  1) leave SDW alone -- it tells a quite coherent 
>>> story at the architecture level, and we can view instances of
>> sniffing as 
>>> deviations from the architecture
>>> or 2) do a very careful job of explaining 
>>> just what does and doesn't change in the SDW story given thatsniffing 
>>> happens.
>> 
>> I have roughly attempted your choice 1) with the understanding 
>> that this was the will of the group. As you note though, we 
>> could do a much more careful job of explaining what changes 
>> given that sniffing happens.
>> 
>> Regards,
>> 
>> - johnk
> --------------------------------------
> Noah Mendelsohn 
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
> 
> 
> 
> 
> 
> 
> 
> 
> John Kemp <john@jkemp.net>
> Sent by: www-tag-request@w3.org
> 01/07/2010 10:15 AM
> 
>        To:     noah_mendelsohn@us.ibm.com
>        cc:     Larry Masinter <masinter@adobe.com>, "www-tag@w3.org WG" 
> <www-tag@w3.org>
>        Subject:        Re: ACTION-308 (part 2) Updates to 'The 
> Self-Describing Web'
> 
> 
> 
> On Jan 6, 2010, at 4:52 PM, noah_mendelsohn@us.ibm.com wrote:
> 
> [...]
> 
>> Furthermore, the draft text really doesn't explain how allowance for 
>> sniffing would change the rest of the SDW story.
> 
> And that was deliberate. I am not "allowing sniffing" so much as saying, 

> "if you are going to sniff then do it this way". I didn't intend to 
change 
> the meaning of the SDW story at all, or its relationship to the use of 
> authoritative metadata.
> 
>> After all, we give 
>> examples in which providers of data are held legally accountable for 
>> having published certain content, precisely because the chain of 
> normative 
>> specifications makes clear their correct interpretation.  In a world 
> where 
>> people start to "sniff", am I accountable for the (mis) interpretation 
> of 
>> something served as text/plain that just happens to resemble some other 

>> media type?  The whole point of SDW is to tell stories like that.
>> 
>> So, I agree with Larry that we should steer clear of elevating sniffing 

> to 
>> being even a good practice at the architecture level (it's not a 
>> "principle" in the sense of AWWW principles in any case);  even if we 
do 
> 
>> want to acknowledge that widespread use of sniffing in practice in a 
>> revised SDW, I think it behooves us to carefully explain how the core 
>> stories about accountability and lack of ambiguity are affected.
> 
> I agree that it would be good to explain the ambiguity introduced by 
> sniffing.
> 
>> I think 
>> we have two choices:  1) leave SDW alone -- it tells a quite coherent 
>> story at the architecture level, and we can view instances of sniffing 
> as 
>> deviations from the architecture
>> or 2) do a very careful job of explaining 
>> just what does and doesn't change in the SDW story given that sniffing 
>> happens.
> 
> I have roughly attempted your choice 1) with the understanding that this 

> was the will of the group. As you note though, we could do a much more 
> careful job of explaining what changes given that sniffing happens.
> 
> Regards,
> 
> - johnk
> 
>> 
>> Noah
>> 
>> --------------------------------------
>> Noah Mendelsohn 
>> IBM Corporation
>> One Rogers Street
>> Cambridge, MA 02142
>> 1-617-693-4036
>> --------------------------------------
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
Received on Monday, 1 February 2010 22:40:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:19 GMT