W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: no mention of BOMs in sniffing algorithm

From: ryan <ryan@theryanking.com>
Date: Tue, 21 Aug 2007 11:35:10 -0700
Message-Id: <0B72581E-399E-4EB8-912D-B0B1471B288F@theryanking.com>
Cc: public-html@w3.org
To: Dan Connolly <connolly@w3.org>
On Aug 21, 2007, at 7:44 AM, Dan Connolly wrote:

> On Mon, 2007-08-20 at 17:28 -0700, ryan wrote:
>> Section 4.7.4, which deals with sniffing for different content types,
>> has no mention of BOMs.[1]
>> In implementing this, I encountered a case where this failed:
>>    http://www.armencomp.com/tradelog/trader_tax_topics.rss
> Thanks for finding a specific case.
> It's awkward for this WG to use that document as a test case,
> as it's not clear that we have license to republish it,
> create derivative works, etc.

Agreed. I tend to use cases like this in my work, but that's mostly a  
private test suite.

> Would you please create a file that exhibits the same issue,
> and attach it to a message to this WG?

Done, test attached (hope it doesn't get mangled).

> There are perhaps other places you could put it and still
> make it clear that you're contributing it to this WG, but
> that's the simplest one that occurs to me just now.

I contribute tests to the html5lib project[1]. I, and I assume most  
people working on that code, would be happy to contribute those tests 
[2] to the WG.

>> Though this resource should be a problem with the sniffing algorithm
>> (since its served as text/plain, which shouldn't trigger the feed vs
>> html sniffing), it still illustrates the problem.
>> Also, Firefox treats this as a feed, while Safari treats it as plain
>> text.
> Interesting.
> Have you given any thought to a format for expected results
> for a test case such as this?

For sniffing I've created one in the html5lib project already[3].  
It's JSON that includes a stream of bytes plus the type we expect the  
sniffing algorithm to return.

As a sidenote, these are all examples taken from the wild web. I'm  
not sure what copyright law says about quoting a small part of a  
document for the purpose of building a test suite.

> I'm interested to start capturing claims about which implementation
> passes which test in machine-readable form.
> I had fun doing this with the GRDDL tests; see
>   http://www.w3.org/2001/sw/grddl-wg/td/test_results
>   http://www.w3.org/2001/sw/grddl-wg/td/earlsum.py
>   http://www.w3.org/TR/grddl-tests/#earl-reporting
>> -ryan
>> 1. http://www.whatwg.org/specs/web-apps/current-work/#content-type3
>  aka http://www.w3.org/html/wg/html5/#content-type3

Of course. :) I've been using the what-wg version for several years,  
so the habit's pretty ingrained.


1. http://code.google.com/p/html5lib/
2. http://html5lib.googlecode.com/svn/trunk/testdata/
3. http://html5lib.googlecode.com/svn/trunk/testdata/sniffer/

Received on Tuesday, 21 August 2007 18:35:34 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:25 UTC