[Bug 9071] Handling of "[" in between-doctype-public-and-system-identifiers-state may not be ideal

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9071





--- Comment #16 from Simon Pieters <simonp@opera.com>  2010-02-25 22:17:42 ---
(In reply to comment #10)
> I changed my regexps to only look at pages that match this:
> 
>    /<!doctype\s+html\s+public\s+"[^"]+"\s*\[/i
>    /<!doctype\s+html\s+public\s+'[^']+'\s*\/i
> 
> The results, looking for this data in the Google index, found about 0.000125%
> of pages are have this particular DOCTYPE pattern. Note, though, that this
> doesn't include DOCTYPEs that are simply bogus, e.g. that have a missing quote
> in the system identifier part, which Philip's data _does_ catch.
> 
> Here's a random selection of some of the matching pages:
> 
> http://www.austinwyatt.co.uk/property-details-rpsMSE-AWE090138
doesn't matter

> http://www.bairstow-eves.co.uk/content/011_Legal_Information
doesn't matter (doesn't seem to have css applied)

> http://www.bairstoweves.co.uk/content/008_Offices/
doesn't matter

> http://www.boekbesprekingen.nl/cgi-bin/auteur.cgi?auteur=311737&type=biografie
N/A

> http://www.chappellandmatthews.co.uk/content/006_Information/001_HIPs/
doesn't matter

> http://www.countrywidescotland.co.uk/property-details-rpsCWN-ADR080332
doesn't matter

> http://www.daiwaint.co.jp/stock/Sheets/SK1.htm
doesn't matter

> http://www.diolla.ru/catalog/pharmacy/preparates/anticought/nose-drops/p_103129
N/A

> http://www.entwistlegreen.co.uk/property-details-rpsBAD-RUN090794
doesn't matter

> http://europroject.pl/index.php?pid=3:15:44
slight layout change, but doesn't matter

> http://www.frankinnes.co.uk/content/001_Contact_Us/001_Sales/
doesn't matter

> http://www.gallex.ch/gallex/1/141.41.html
doesn't matter

> http://www.gpees.co.uk/content/001_Search/004_New_Homes/
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
[url=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
has different spacing in standards mode and quirks mode. Can't tell which is
intended.

> http://www.jazz-network.com/kumpf/p-lyrik.html
doesn't matter

> http://www.manncountrywide.co.uk/property-details-rpsMSE-CWS090264
doesn't matter

> http://symptomresearch.nih.gov/chapter_13/sec5/ckns5pg2.htm
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" []>
needs quirks

> http://oestjyllandsflyt.dk/privatflytning/flyttetilbud/
doesn't matter

> http://www.palmersnell.co.uk/content/002_To_Let/002_Lettings/
doesn't matter

> http://www.spencers.co.uk/content/002_To_Let/001_Lettings_Area_Search/
doesn't matter

> http://www.strattoncreber.co.uk/property-details-rpsSTC-REH090342
doesn't matter

> http://www.sugano-foods.co.jp/products2.html
N/A

> http://runker_room.tripod.com/tiestalk/japped.htm
doesn't matter

> http://www.lpl.univ-aix.fr/projects/multext/CES/CES1.Annex7.html
doesn't matter

> http://www.winncom.com/moreinfo/item/5054-BSUR-LR-US/index.html
N/A


Many of these seem to be based on the same template (the ones that have [url=
in the doctype).


> I'm leaning towards not changing the spec, based on the rarity of this and
> based on Simon's findings earlier in this bug.

We could make "[" after public identifyer go into bogus doctype without setting
force-quirks, while letting any other garbage character set force-quirks ("S"
and "/" needed force-quirks from my earlier findings), and not regress compat.
However, it's just one page of those analyzed that is affected by it (and for
that page I couldn't tell whether it would actually be helped or not), so I
would suggest to Avoid Needless Complexity.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 25 February 2010 22:17:44 UTC