Re: Supporting MathML and SVG in text/html, and related topics

Le 16 avr. 08 à 11:14, Henri Sivonen a écrit :
> On Apr 16, 2008, at 10:47, Paul Libbrecht wrote:
>> why is the whole HTML5 effort not a movement towards a really  
>> enhanced parser instead of trying to redefine fully HTML successors?
>
> text/html has immense network effects both from the deployed base  
> of text/html content and the deployed base of software that deals  
> with text/html. Failing to plug into this existing network would be  
> extremely bad strategy.

I'm not saying that should fail nor that an enhanced parser should  
not care for that, it should for sure.

> In fact, the reason why the proportion of Web pages that get parsed  
> as XML is negligible is that the XML approach totally failed to  
> plug into the existing text/html network effects[...]

My hypothesis here is that this problem is mostly a parsing problem  
and not a model problem. HTML5 mixes the two.
There are tools that convert quite a lot of text/html pages (whose  
compliance is user-defined to be "it works in my browser") to an XML  
stream today NeckoHTML is one of them. The goal would be to formalize  
this parsing, and just this parsing.

>> Being an enhanced parser (that would use a lot of context info to  
>> be really hand-author supportive) it would define how to parse  
>> better an XHTML 3 page, but also MathML and SVG as it does  
>> currently... It has the ability to specify very readable encodings  
>> of these pages.
>>
>> It could serve as a model for many other situations where XML  
>> parsing is useful but its  strictness bytes some.
>
> Anne has been working on XML5, but being able to parse any well- 
> formed stream to the same infoset as an XML 1.0 parser and being  
> able to parse existing text/html content in a backwards-compatible  
> way are mutually conflicting requirements. Hence, XML5 parsing  
> won't be suitable for text/html.

I think that should be possible to minimize the conflicts if such a  
parsing is contextualized well. XML5 tastes like a generic attempt at  
flexibilizing generic xml parsing which is clearly too little  
flexibilization.

>> Currently HTML5 defines at the same time parsing and the model and  
>> this is what can cause us to expect that XML is getting weaker. I  
>> believe that the whole model-definition work of XML is rich, has  
>> many libraries, has empowered a lot of great developments and it  
>> is a bad idea to drop it instead of enriching it.
>
> The dominant design of non-browser HTML5 parsing libraries is  
> exposing the document tree using an XML parser API. The non-browser  
> HTML5 libraries, therefore, plug into the network of XML libraries.  
> For example, Validator.nu's internals operate on SAX events that  
> look like SAX events for an XHTML5 document. This allows  
> Validator.nu to use libraries written for XML, such as oNVDL and  
> Saxon.

So, except for needing yet another XHTML version to accomodate all  
wishes, I think  it would be much saner that browsers'  
implementations and related specifications rely on an XML-based model  
of HTML (as the DOM is) instead of a coupled parsing-and-modelling  
specification which has different interpretations at different places.

paul

Received on Wednesday, 16 April 2008 09:59:01 UTC