Re: Use cases

On 05/01/2011 14:40, Anne van Kesteren wrote:
> I think that resources need to be processed unambiguously. Having
> resources processed sometimes as XML and sometimes as HTML depending on
> the user agent is very fragile and does not lead to interoperability.

The problem is that it will happen anyway. People don't pass pdf to a 
html renderer, The formats are obviously different, and if you try it it 
obviously doesn't work. But because of the way html5 is designed, they 
will pass xml syntax html5 to the html5 parser, and it will work most of 
the time, and when it fails you get no error, you just get weird, 
silent, data corruption.

No one (I think) is going to suggest that you get errors from the html 
parser, so the other choice (other than just accepting the data 
corruption and documenting how to avoid it) is to make the result of 
parsing less corrupt.

I think trying to parse arbitrary xml (with namespaces, and doctypes 
and...) with the html5 parser is a non-starter, but the foreign content 
mode almost does exactly the right thing, except

a) it can only be started by math and svg
b) it aborts on nested html elements
c) it can not be used for the whole document.

It seems that really only (c) is problematic as finding a switch that 
would allow an xml syntax (interpreting /> as empty in particular) might 
be difficult. But (a) could be solved by just not doing that and there 
have been various suggestions for solving (b) such as the "like svg" 
change proposal or adding <xml> there could of course be objections to 
these but these are at the level of different people weighing different 
priorities, there are not really any deep technical problems.


The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 

Received on Wednesday, 5 January 2011 15:17:30 UTC