Re: several messages about New Vocabularies in text/html

On Thu, 3 Apr 2008, Sam Ruby wrote:
> Ian Hickson <> wrote on 04/03/2008 05:20:55 PM:
> >
> > I now have some rough notes of what I think we can do to the parser to 
> > handle SVG and MathML here:
> >
> >
> My quick read indicates that this suffers the same issue that you 
> previously referred to as 'fatal' when Henri suggested using <math> as a 
> trigger.

The key to avoiding the problem is the variant on Simon's idea, which is 
to hard-code all the HTML element names and cause them to automatically 
close the "namespaced" scope.

So e.g.:

   <math><b> treated as:


> I think it could be improved by an additional check for an attribute 
> named xmlns; and furthermore wonder if that additional check were in 
> place would a check on the element name even be necessary.

Having both would be unnecessary and would cause tree construction 
behaviour to depend on attributes, which is something I've been trying to 
avoid (though we do have it for <input type=hidden>). Having it only on 
the attribute would be effectively the same as requiring both since the 
only cases where a MathML element could be found in an HTML context, or an 
SVG element could be found in an SVG context, is when the root <math>, or 
<svg>, element is found.

> I also don't see anything which would begin to address xlink.

I plan to deal with attribute namespaces and case in the "if namespace is 
svg, apply case fixups" bit.

> This proposal makes references to "case fixups" and lists specific which 
> implies to me that it is tightly coupled to snapshot of these specific 
> vocabularies as they exist at the moment.  A consequence of that 
> decision is that this vocabulary may need to be updated every time those 
> vocabularies are revised.

Right, at least for non-lowercase elements (not a problem for MathML so 
far, though it could be a problem for SVG -- however, the SVG is very 
inconsistent in its naming schemes and could easily just use lowercase 
elements without ruining the language aesthetics, so if they want to deal 
with this going forward they could easily do so).

In practice, new elements and attributes don't just get supported in 
browsers without someone having to write code to support them, and so 
requiring that the person implementing the feature also add another 
special case to a table in their parser is a non-issue.

> As such, I don't believe that this meets the stated requirement of 
> "Ability for an author to unilaterally extend the language to address 
> problems we are currently unaware of and that therefore are not covered 
> by existing functionality".

Actually this proposal isn't at all intended for use as an author-level 
extension syntax. HTML has long used the class="" attribute for this, and 
I propose to continue using this, along with a new set of attributes for 
name-value pairs using the data- prefix (and some corresponding DOM 
infrastructure). For an example of this, please see:

In practice for most HTML Web application authors this ends up being more 
useful, and certainly easier to make accessible, than having the authors 
mix in entirely arbitrary private vocabularies into their markup.

> triggering transitions based on an attribute named xmlns would go a long 
> way towards addressing this requirement.

As I've mentioned several times, I don't see any way we can do this on the 
Web given the ridiculous numbers of elements of all kinds already in 
text/html content on the Web that have xmlns="" attributes with various 
values. I agree that in an ideal world it would be a great idea. But we're 
not in an ideal world. Reality must be dealt with on its terms, not ours.

> Simply specing that this "in namespace" state is case sensitive

Actually it turns out to make things really complicated. For example, 
consider these four cases:

  1 <math><B>
  2 <math><FOO>
  3 <math><mtext><B>
  3 <math><mtext><FOO>
  4 <math><mtext><mglyph>

In case 1, we need to recognise that the B element is a known HTML 
element, but we need to put the FOO element into the MathML namespace. In 
the second case, we want <B> and <FOO> to become lowercase HTML elements, 
but we still want the <mglyph> to end up in the MathML namespace. If we 
were case-sensitive, then we'd want to recognise <B> as <b> but not 
<MGLYPH> as <mglyph>. It ends up being far simpler to just support the 
element names case-insensitively and fix them up afterwards.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 4 April 2008 04:46:08 UTC