Re: several messages about New Vocabularies in text/html

On Fri, 4 Apr 2008, Julian Reschke wrote:
> >
> > The key to avoiding the problem is the variant on Simon's idea, which 
> > is to hard-code all the HTML element names and cause them to 
> > automatically close the "namespaced" scope.
> > 
> > So e.g.:
> > 
> >    <math><b>
> > 
> > ...is treated as:
> > 
> >    <math></math><b>
> 
> That may work for MathML and SVG vs HTML5 as of today, but makes future 
> changes extremely hard, because of potential name clashes.

Indeed, and we already have name clashes (one, for MathML; at least half 
a dozen, for SVG).

This isn't the design I would have picked if I was designing this in a 
vacuum. However, we have to deal with the existing content. This has been 
an underlying theme for the HTML5 effort since we started in 2004. 
(Actually, since even earlier, with the Web Forms 2 work going back to 
late 2003.)

If we're willing to consider solutions that _don't_ take the existing 
legacy content into account, we're better off doing a more thorough job 
and going with something like XHTML2, XML, XML Namespaces, and so forth.


> > > As such, I don't believe that this meets the stated requirement of 
> > > "Ability for an author to unilaterally extend the language to 
> > > address problems we are currently unaware of and that therefore are 
> > > not covered by existing functionality".
> > 
> > Actually this proposal isn't at all intended for use as an 
> > author-level extension syntax. HTML has long used the class="" 
> > attribute for this, and
> 
> HTML has also allowed adding new element names

HTML has never allowed custom element names.


> > I propose to continue using this, along with a new set of attributes 
> > for name-value pairs using the data- prefix (and some corresponding 
> > DOM infrastructure). For an example of this, please see:
> > 
> >    http://wiki.whatwg.org/wiki/CustomData
> > 
> > In practice for most HTML Web application authors this ends up being 
> > more useful, and certainly easier to make accessible, than having the 
> > authors mix in entirely arbitrary private vocabularies into their 
> > markup.
> 
> What exactly do you mean by "more accessible" here?

If someone has data in a grid that they want to render, there are several 
ways they could go about it. In a pure XML/XHTML workflow, they could do:

   ...
   <html:p>And the data is:</html:p>
   <my:board>
    <my:line input="5">
     <my:point output="2" bias="3"/>
     <my:point output="1" bias="2"/>
     <my:point output="4" bias="2"/>
    </my:line>
    ...

...with some CSS (and/or XBL) to pretty things up, maybe combined with 
some ARIA to make it work in ATs that support it. Or, they could do this 
(either in XHTML or text/html HTML5) with the new extensions:

   ...
   <p>And the data is:</p>
   <table class="my-board">
    <tr class="my-line" data-input="5">
     <td class="my-point" data-bias="3">2</td>
     <td class="my-point" data-bias="2">1</td>
     <td class="my-point" data-bias="2">3</td>
    </tr>
    ...

...with again some CSS (and/or XBL), but not only would they not need ARIA 
to get this to be accessible to non-visual users, but the page would in 
fact also work (be accessible) in non-CSS UAs such as Lynx, as well as 
being trivialy comprehensible, in terms of its basic structure, to search 
engines and other robots.


> Anyway, if "xmlns" really can't be used, there's nothing stopping us 
> minting a new attribute name that has the same effect.

I did consider that, but it still fails in the face of early adopter cargo 
cult behaviour, and loses compatibility with XML, which is one of the main 
reasons to try this in the first place. I mean, if we aren't going to get 
syntax compatibility with XML, the details of the syntax become somewhat 
secondary, and you might as well turn

   <foo ns="bar" baz="quux">

...into:

   <span class="bar-foo" data-baz="quux">

...since the difference is purely syntactic at that point.


> > > Simply specing that this "in namespace" state is case sensitive
> > 
> > Actually it turns out to make things really complicated. For example, 
> > consider these four cases:
> > 
> >   1 <math><B>
> >   2 <math><FOO>
> >   3 <math><mtext><B>
> >   3 <math><mtext><FOO>
> >   4 <math><mtext><mglyph>
> > 
> > In case 1, we need to recognise that the B element is a known HTML 
> > element, but we need to put the FOO element into the MathML namespace. 
> > In the second case, we want <B> and <FOO> to become lowercase HTML 
> > elements, but we still want the <mglyph> to end up in the MathML 
> > namespace. If we were case-sensitive, then we'd want to recognise <B> 
> > as <b> but not <MGLYPH> as <mglyph>. It ends up being far simpler to 
> > just support the element names case-insensitively and fix them up 
> > afterwards.
> 
> Out of curiosity: where does the requirement to include HTML (not XHTML) 
> into MathML come from?

I don't have a note of the precise e-mail that listed the problem being 
solved here, but it should be in this list somewhere:

   http://www.whatwg.org/issues/#html-parsing-rules-namespaces-discussion

If you mean why don't we simply use XHTML, that's a language design 
decision. We already have a language that can combine MathML and XHTML, 
it's XML. There's no point solving the problem twice the same way. It 
would also be extremely confusing to most regular authors if HTML elements 
behaved one way in certain parts of their documents and another way a few 
lines lower down. For example, it would make copy-and-paste within a 
single document fail to work faithfully, which is far worse than 
copy-and-paste across documents of different MIME types not working.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 4 April 2008 08:40:21 UTC