Re: Exploring new vocabularies for HTML

On Mar 31, 2008, at 12:01, David Carlisle wrote:
>> The right way to do either is to run an HTML5 parser.
> I don't see how that is likely to happen while the "html parser" is
> simply that, with so many hard coded rules for html elements.

This "exploring new vocabularies" effort is about adding rules for SVG  
and MathML as well.

> If the parsing was abstracted away from html and then some  schema
> language was used to specify html5 in terms od that abstraction,
> perhaps other languages could least consider whether they wanted to
> offfer lax "html-style" parsing in addition to xml.

If MathML is covered specifically, why would it matter that an off-the- 
shelf parser also comes with built-in knowledge of HTML? It's not like  
the executable size were critical for math processing software.

> This is essentially how John Cowan's tag soup works. Now it may be  
> that you've looked at
> existing behaviour and decided the only way to model that is build in
> special rules everywhere, if that's the case, so be it, but that
> severely limits the usefulness of such a parser in a non-html context.

It is easier to define direct tree builder rules than to have an  
additional formalism for defining them.

>> We can ask browsers to use the XML serialization for clipboad export
>> on platforms that have pre-existing deployed XML-based clipboard
>> flavor for MathML
> yes and you would also need to ask all editing systems not to generate
> <math>1+2=3</math> so that what they produce could be used as mathml
> without having to pass it to a browser and cut it out. The simplest  
> way
> to ensure that editors don't produce such corruption is not to imply
> that it is legal in the first place. It offers very little benefit to
> anyone, and massive oportunities for incompatiblity with the past and
> corruption of data (where the system does not imply the element
> structure the author expected) in the future.

Whether it offers a benefit depends largely on whether the tag  
inference is so good that it actually makes MathML-in-text/html human- 
writable. If not, going half-way so that people still need something  
like iTeX2MML would be pointless and we might as well expect  
converters to include tags explicitly.

Again, I'm *very* skeptical about this tag inference stuff, but you  
will need an HTML5 parser in the general case anyway.

Henri Sivonen

Received on Monday, 31 March 2008 09:45:08 UTC