[Bug 9071] Handling of "[" in between-doctype-public-and-system-identifiers-state may not be ideal

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9071





--- Comment #12 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>  2010-02-25 05:08:31 ---
(In reply to comment #10)
> I changed my regexps to only look at pages that match this:
> 
>    /<!doctype\s+html\s+public\s+"[^"]+"\s*\[/i
>    /<!doctype\s+html\s+public\s+'[^']+'\s*\/i
> 
> The results, looking for this data in the Google index, found about 0.000125%
> of pages are have this particular DOCTYPE pattern. Note, though, that this
> doesn't include DOCTYPEs that are simply bogus, e.g. that have a missing quote
> in the system identifier part, which Philip's data _does_ catch.

In the debate that I and Philip had in public-html, then Philip brought forward
a page that used a HTML40 (not 401) transitional doctype. Which thus was quirks
triggering _because of that_. But which Philip, at first, thought was quirks
triggering because of the "[]" characters.

And I think you have done the same thing here.. E.g. it seems you brought one
page from the same site that Philip mentioend:

HTML40Transitional: http://symptomresearch.nih.gov/chapter_13/sec5/ckns5pg2.htm

And these used a HTML3 variant:

HTML30: http://www.jazz-network.com/kumpf/p-lyrik.html
HTML32: http://www.daiwaint.co.jp/stock/Sheets/SK1.htm
HTML32: http://www.gallex.ch/gallex/1/141.41.html
HTML32: http://www.sugano-foods.co.jp/products2.html
HTML30: http://runker_room.tripod.com/tiestalk/japped.htm
HTML32: http://aune.lpl.univ-aix.fr/projects/multext/CES/CES1.Annex7.html

The following page It has the doctype in the middle of the document - and
shoudl therefore be in quirks mode because, in the DOM then it has no doctype: 

http://www.boekbesprekingen.nl/cgi-bin/auteur.cgi?auteur=311737&type=biografie

This page has standards doctype where it should be, and a error doctype in the
middle of the document - thus in the DOM, it _has_ a correct standards
triggering doctype:

http://www.diolla.ru/catalog/pharmacy/preparates/anticought/nose-drops/p_103129

ALL THE PAGES I HAVE MENTIONED ABOVE do not belong into the "book keeping" that
you try to perform here, Ian. They are irrelevant to the issue that we are
discussing.

--------

Now, there were very many XHTML pages amongst your selection of pages, and all
of them contained a "[url=" instead of a quote mark in the system identifier
part. Typically this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
[url=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

(<http://www.gpees.co.uk/content/001_Search/004_New_Homes/>)

In IE8 this doctype does not trigger quirks. Firefox also not. Opera 10.5 beta
yes. But not in released versions of Opera.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 25 February 2010 05:08:33 UTC