Re: Distributed Extensibility

Dan Connolly wrote:
> On Fri, 2007-08-03 at 15:42 +0300, Henri Sivonen wrote:
>> On Aug 2, 2007, at 18:16, Sam Ruby wrote:
> [...]
>>>     * The notion of “enclosing element” is problematic in the face  
>>> of adoption agency algorithms and the like.  The prudent thing to  
>>> do is to define any case where reparenting would change the meaning  
>>> of any element to be a (recoverable) error.  This would affect very  
>>> few users or documents.  It would be a bitch to code in a  
>>> conformance checker, but that’s not the spec’s writer’s concern.  :-)
>> Reparenting is already an error.
> 
> Care to elaborate? If anybody has a moment to walk me/us thru the
> relevant parts of the spec, or just point to it, I'd appreciate it.

I'll take that one.  I'm not sure how much of the spec you are familiar 
with, so if I cover something that you are already aware of, just skim 
over it.  It might be helpful to others.

Much has been said about how requiring quotes around attribute values 
and slashes in empty tags caused XHTML to "fail", but IMHO, that misses 
one of the more interesting stories.  Consider the following fragment:

     <table><tr><td>x</td></tr><o:p>y</o:p></table>

Now load it in your favorite browser.  It may surprise you, but you will 
see a 'y' followed by an 'x' on a separate line.

Note that this fragment is well formed in the XML sense -- not namespace 
well formed, just simple well-formed, but we'll come back to that in a 
minute.  So in one sense, it is unambiguous, but in another sense it is 
horribly wrong in that not only is <o:p> not defined, it isn't something 
that you would expect to find in a table.

What's going on here?  Well, the <o:p> element got a foster parent, as 
defined here:

     http://www.whatwg.org/specs/web-apps/current-work/#foster

(you might want to scroll back a few lines from that anchor to see the 
context, namely "anything else" within a <table>)

Why would any browser do such an idiotic thing?  Well, the truth is that 
they have all been reverse engineering each other for so long, it may be 
hard to tell; these days most people reverse engineer IE, but a lot of 
behaviors in IE were reverse engineered from NN, but in any case, they 
now all do it.  And any new entrant into this field had better do it 
too, lest they render some portion of the web "wrong".

And furthermore, they had better not just render it this way, they had 
better represent it that way in the DOM, as that's what scripts will be 
expecting.

And with such, HTML5 was born.  Instead of everybody reverse engineering 
each other and IE over silly things like what does "&#10;" mean, the 
bright idea was to write down one definition and have everybody migrate 
towards it.  This isn't as easy as it looks, which just makes it all the 
more valuable to have it done.  As long as the behavior isn't totally 
insane, it is included in the spec.  (Just for fun, search for "Don't 
ask" in the spec).

Now, what does this have to do with the question at hand?  Imagine the 
fragment defined above included in a larger document that declares 
xmlns:o at both the root level and on the table element.  Which one applies?

My answer: the spec should pick one, and I don't particularly care which 
one, but everybody should do it the same way.  And a parse error should 
be generated.

Henri's point is that a parse error is already generated in this, and 
every similar, situation.

That's fine, I would suggest that this case deserves two parse errors 
then.  Parse errors are cheap.  :-)

- Sam Ruby

Received on Friday, 3 August 2007 23:32:42 UTC