[Bug 10802] Limit the number of identical items on the list of active formatting elements by removing previous duplicates when adding new items

http://www.w3.org/Bugs/Public/show_bug.cgi?id=10802

Henri Sivonen <hsivonen@iki.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
                 CC|                            |jgraham@opera.com,
                   |                            |jonas@sicking.cc,
                   |                            |w3c@adambarth.com
         Resolution|NEEDSINFO                   |

--- Comment #3 from Henri Sivonen <hsivonen@iki.fi> 2010-10-13 12:50:25 UTC ---
Philip ran an instrumented parser over 422814 pages that parsed successfully.
Here's an analysis of that data:

maxNonFontDuplicates (cutoff: 0.999000)
0.9422: <= 0
0.9868: <= 1
0.9928: <= 2
0.9953: <= 3
0.9965: <= 4
0.9971: <= 5
0.9975: <= 6
0.9980: <= 7
0.9983: <= 8
0.9986: <= 9
0.9987: <= 10
0.9989: <= 11
Max: 7687

maxFontDuplicates (cutoff: 0.999000)
0.9468: <= 0
0.9826: <= 1
0.9890: <= 2
0.9918: <= 3
0.9933: <= 4
0.9943: <= 5
0.9950: <= 6
0.9956: <= 7
0.9960: <= 8
0.9966: <= 9
0.9969: <= 10
0.9973: <= 11
0.9975: <= 12
0.9977: <= 13
0.9978: <= 14
0.9980: <= 15
0.9981: <= 16
0.9982: <= 17
0.9982: <= 18
0.9985: <= 19
0.9986: <= 20
0.9986: <= 21
0.9987: <= 22
0.9987: <= 23
0.9988: <= 24
0.9988: <= 25
0.9988: <= 26
0.9989: <= 27
0.9989: <= 28
0.9990: <= 29
Max: 6829
This means that when adding a non-<font> formatting element to the list of
formatting elements, on 94% of pages there was no identical element (element
name and all attribute names and values matching) on the list *after the latest
marker if any* already. On 99% of pages, there were 2 or fewer duplicates
already on the list (after the latest marker if any). The worst case seen was
7687 duplicates.

In the case of <font> duplicates, on 99% of pages, there were 3 or fewer
duplicates already on the list (after the latest marker if any). The worst case
seen was 6829 duplicates.

So the worst cases are really crazy, so it makes sense to pick some limits.
Furthermore, very low limits take care of the vast majority of cases. I'd be
inclined not to differentiate between <font> and non-<font>, and simply
allowing a maximum of two identical elements already on the list when adding a
third.

Again, please see
http://lists.w3.org/Archives/Public/public-html/2010Sep/0163.html for how to
deal with removing duplicates.

I think it would make sense to put the limit in the spec, because it would suck
if an HTML5-compliance scoring site like http://html5test.com/ put 4 identical
formatting start tags in a test case and called an implementation
non-conforming.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 13 October 2010 12:50:32 UTC