- From: Karl Dubost <karl@w3.org>
- Date: Tue, 3 Oct 2006 13:36:03 +0900
- To: Ian Hickson <ian@hixie.ch>
- Cc: www-qa@w3.org
Le 30 sept. 06 à 05:56, Ian Hickson a écrit :
> On Fri, 29 Sep 2006, Karl Dubost wrote:
>>
>> It is why I have asked more details to Ian Hickson, because I really
>> think it is as much important as the derived statistics which have
>> been
>> published in the [previous survey][1]. When the sample is not
>> given or
>> clearly identified it is really difficult to draw meaningful
>> conclusions.
>
> This is absolutely true. This is why the survey(s) haven't been
> published
> formally; due to the nature of the way in which the results were
> obtained,
> I can't write a scientific report.
1. True to "we can't draw meaningful conclusions". It is not suitable
scientific report.
> The data was collected for the purposes
> of helping WHATWG's spec development work
2. Google has created the survey for helping WHATWG.
> (I think all specifications
> should be written based on solid research of authoring practices,
> etc),
> and I consider the data to be suitably representative for that
> purpose.
3. The survey is a "solid research of authoring practices"
> For other purposes, the data probably isn't useful as anything
> other than
> an idle curiosity, and I would not recommend treating it as
> anything but
> that.
I have hard time to connect 1, 2 and 3 in a logical way.
>
> If you would like a more formal survey of the Web, I recommend
> comissioning your own. :-)
It is a good idea.
Maybe I should ask to TV Raman, Google if Google would agree to help
us to do that.
>> - DOCTYPE
>
> I'm not sure how you would define this; take this document, for
> instance:
>
> http://damowmow.com/playground/html-or-xml.html
> What's the DOCTYPE?
> How about this one:
> http://damowmow.com/playground/html-or-xml.xml
Do you mean there are plenty of these documents on the Web?
Or are there just corner cases that has been created to identify
potential problems?
using http://web-sniffer.net/
GET /playground/html-or-xml.xml HTTP/1.1[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
GET /playground/html-or-xml.html HTTP/1.1[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
GET /playground/html-or-xml HTTP/1.1[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
I have just put the source here.
################
<?test ><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html><?test ><!-- ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<?test --><?test?>
<head>
<title>HTML or XML?</title>
</head>
<body>
<p>Is this file HTML or XML?</p>
<p>Why, it's <?test > HTML <!-- ?> XHTML <?test --> <?test ?> of
course!</p>
</body>
</html>
################
How many documents with this kind of structure have you found on the
Web?
> What's the DOCTYPE?
> If your answer was different for the two pages, then why was it
> different?
> The two pages are byte-for-byte identical. If your answer was the
> same,
> then why were they the same? Browsers treat the two very differently.
Your document is sent as text/xml
and then as application/xhtml+xml
and then as text/html if the first is not understood.
plus the problem of encoding.
> (This is why my survey mostly ignored the DOCTYPE and instead just
> assumed
> HTML5 parsing rules.)
Then Google has created a "WebApps 1.0 parser" for the purpose of the
survey?
Is the code accessible somewhere?
Was it a crawler?
Was it a parser working on files outside of their HTTP context?
--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
QA Weblog - http://www.w3.org/QA/
*** Be Strict To Be Cool ***
Received on Tuesday, 3 October 2006 04:36:24 UTC