[whatwg] The problems with namespaces in text/html

On Nov 5, 2006, at 01:19, Elliotte Harold wrote:

> Henri Sivonen wrote:
>
>> Anne is talking about the text/html serialization, which is  
>> supposed to be parsed using an HTML5 parser. It is a special- 
>> purpose alternative serialization for a subset of possible  
>> infosets--like RELAX NG Compact Syntax. Please ignore the  
>> superficial syntactic similarity to XML 1.0.
>
> Does that subset include MathML?

Not yet. Whether it should is what is being discussed.

> However if the plan is to mix in entire additional languages, then  
> I think this is driving off a cliff. MathML and MathML tools are  
> designed under the assumption that they can rely on well-formedness  
> and namespaces. Integrating MathML with HTML absolutely needs this.

You wouldn't be able to feed MathML-enabled HTML5 to MathML tools  
that use an XML parser. You'd either have to use an HTML5 to XHTML5  
converter for creating an intermediate XML 1.0 serialization that can  
be fed to an XML parser or you could optimize away the serialization  
and plug an HTML5 parser into the XML processing pipeline the way  
TagSoup is used.

> It sounds to me like the working group is considering the needs of  
> thick-client web browsers

The WHAT WG is very much biased towards Gecko, Presto, WebKit and  
Trident as the consumers of documents.

> and people hand-authoring HTML in text editors to the complete  
> exclusion of every other community and use case.

Personally, I think MathML is so hopelessly verbose for hand  
authoring that this really shouldn't be about enabling hand authoring  
MathML-in-HTML5 but about enabling MathML-in-HTML5 (perhaps generated  
by a future version of itex2mml or similar) to be served through  
content management systems that are not built around a SAX pipeline  
or an XML tree API or XSLT but are built as tag soup systems and  
simply cannot guarantee well-formedness. I mean systems like  
WordPress and MovableType.

> Please prove me wrong. If it's not true that you're planning on  
> sending mixed HTML and MathML documents on the wire without  
> namespaces or perhaps even well-formedness, then please say so; but  
> so far I'm not hearing anyone deny that.

A conforming HTML5 byte stream is *never* a well-formed XML 1.0 byte  
stream. However, for every *conforming* HTML5 byte stream there  
should (in my opinion) exist (in the mathematical sense of existence)  
an XML 1.0 byte stream that parses into the same infoset. (If this is  
not the case, I consider it a spec bug that needs to be fixed.)

So far what has been suggested is that the MathML elements parsed out  
of an HTML5 byte stream would be in the MathML namespace in the infoset.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/

Received on Saturday, 4 November 2006 17:14:51 UTC