[Bug 5752] Parsing should be specified for future updates


--- Comment #10 from Rob Burns <rob@robburns.com>  2008-06-15 08:28:05 ---
> Grepping through a few thousand random pages, it's pretty easy to find lots of
 > cases of trailing slashes where the author seemingly hasn't got a clue what
 > they're doing at all (and therefore wasn't intentionally meaning to get
 > self-closing-tag parsing), like:

Phil, thanks for looking into this. You are a wizard when it comes to these
sample surveys.

I admit it is difficult to imagine what these authors are doing with these
tags. However, I don't see anywhere in these examples where the author needs
the tree constructed in such a way that the trailing contents of the tag with a
solidus would need to be added to the tree as a descendant of the element with
that tag name (which is what we would break by treating the solidus always as a
self-closing tag).

The types of examples we'd be looking for is where a pages was using DOM calls
or CSS selectors or XSLT transforms that relied on the tag with a solidus to be
non-void. For example the one with all of the <NAMESPACE ... /> tags[1] has no
closing </NAMESPACE> tags whatsoever. From that it is clear that the author
didn't intend to create a repeated descent of NAMESPACEs without any closure,
but rather many repeated void NAMESPACE elements. Sure you might say but the
author must have intended each NAMESPACE element to have an implicit close tag,
but that would be a difficult case to make here. And even if that were the
case, the only way this would break is if the author depended on the NAMESPACE
tag to get parsed in a particular way (for example only targeting one browser
since they're all going to parse this differently), and verify that the author
is using CSS or DOM calls that rely on a particular tree structure resulting
from these tags.

So I think the thing to look for in showing how we would break content by
treating a solidus in unknown tags as a self-closing element is:
 1) find tags with a solidus ("<tagname ... />") where there are also
corresponding close tags ("</tagname>") for each. 
 2) find places where the author uses CSS, XSLT, DOM etc that rely on those
tags not being parsed as void elements. 

I would say if we don't find the first, then we're not going to find the
second, but either one would at least give us an indication of how big of a
problem fixing HTML parsing might cause for existing content. Once we know the
size of the problem we would be in a better place to decide whether it should
hold up our progress. If we assume that the pages you found here are
representative, I would say it is not at all a problem.

[1]: http://www.malaysiacricket.com/html/s01_home/home.asp

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Sunday, 15 June 2008 08:28:39 UTC